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ABSTRACT 


Conventional parsing techniques use grammars as embedded 
procedural knowledae bases 1N mechanisms which are caoable 
of translatina words in the language defined into equivalent 
parse trees. The approach described in this paper uses 
context-free grammars as data allowing access to synthesis 
templates which enable the user to create and interact with 
parse trees directly. The advantages of this approach are 
the utility of humanroriented grammars, the dynamic intere 
changeability of language definitions, immediate error re=- 
jection, and the ability to handle partially complete parse 
trees. The design for a prototype proaramming environment 


USING grammaredriven synthesis iS presented. 





TABLE OF CONTENTS 


el. GRAMMAR@DRIVEN SYNTHESIS evececereceee cere eceeccec= 2- 14 
es INTROOUCTION @g@qeoeooe ao @ @2eee e2eeoeeoeee2e2eee eee ee a2 @ @ 14 


B. GRAMMARS ANO SENTENTIAL FORMS eeeeeeeseeeeeee 15 
Ce. A SIMPLE GRAMMAR|DRIVEN STRING EDITOR eerere - @e/7 
O. AN IMPROVED GRAMMAR|DRIVEN STRING EDITOR e222 35 
E. TREE SYNTHESIS qwqqwwnnnwennnnnwnne nnn nneenn- 47 


Fe. COMPARISON OF GRAMMAR*UTILIZATION 


TECHNOLOGIES -------- ----------- 22----------- 63 

III. CONCEPTUAL DESIGN FOR GDE --------------------- -- 65 
A. INTRODUCTION ----- wa enw ewccenwecccerccnecs=== 65 

B. TRANSFORMATIONS ----<------- ee ee eee 67 

C. OISPLAY SCHEMAS ------ ooene eee --------------- 85 

D. THE LANGUAGE DEFINITION MODULE -------------- 95 


IV. PROGRAMS AS DATABASES enecennnnceene en ennnnnne-- 103 
A. INTRODUCTION --eeennnnn enn nnn eenneen-nee-- 103 
B. PROGRAMS AS COMPLEX RELATIONSHIPS ---e--e---- 104 
C. DECOMPOSITION OF THE EVALUATION RELATION <--- 105 
D. CONTROL STRUCTURE -------+-------------- e--- 107 
E. STRUCTURED PROGRAMMING SYSTEMS <-<se-c---e-e-- 108 
F. PHYSICAL REPRESENTATION OF A TREE=STRUCTURED 


PROGRAM <--eeeeee-- weet eror en —==- Sse ssa sa Sou 10 





VI. 


G. PROCEDURAL REPRESENTATION OF DATA eeeeeeeece 113 


H. SUMMARY enccene-nee--- oon nen n neo ----------- 115 
A PROTOTYPE SYSTEM DESIGN <-ere-----e------------ 117 
A. SYSTEM MODULES concen neenee concen en--------- 117 
B. PRE-EXISTING MODULES --------eeee----------- 120 
C. SUBSYSTEM SELECTION -ee------------ eonceen-- 121 
SUMMARY seseeee-------- one eee eee ------ eee-- 125 


A, CONCLUSIONS) enmwewwensecenw cone neeeeenweeere= §=6125 
B. WORK IN PROGRESS eeseseeernee coccee seecere= e\“= !26 


C. FUTURE RESEARCH DIRECTIONS wsececeeeeeeerecee= 127 


APPENDIX Az: NOTATIONAL SYSTEMS FOR CONTEXT=FREE 


GRAMMARS ----- ------ woe / one nnn n nn ----- 132 


APPENDIX Bs A GRAMMAR FOR PASCAL IN ReARGOT eeeereee= 134 


APPENDIX C3: TRANSFORMATION TEMPLATE GRAMMAR) seenrereee 141 


APPENDIX Ds INTERMEDIATE@LEVEL LANGUAGE DEFINITION 


APPENDIX Es: ILD GRAMMAR LANGUAGE DEFINITION seeeee-ee2 147 


APPENDIX Fs MEMORANDUM LANGUAGE DEFINITION erereereee=§ 167 


MeeenDIX G: SYSTEM PREDEFINED FUNCTIONS cvceereeere ee2- §686169 


APPENDIX H: FIGURES cocennnnnee ene nen e enn e ce eeenee=) 172 


LIST OF REFERENCES serereeeweeeewecceerennmceseneence= =| 77 


INITIAL DISTRIBUTION LIST <------eeeene--- 222-2 ------ 178 


\ 





I, SNRRODUCTION 


There is a great deal of interest in the improvement of 
program and system development efficiency, primarily because 
software costs have risen dramatically in recent years as a 
fraction of total system development costs. One approacn to 
the imorovement of efficiency 1S the provision of an 
enhanced set of interactive orogram develooment tools for 
the programmer and the increased automation of program 
development. Many such efforts involve the notion of a 
"orogramming environment", that iS, an interactive 
environment in which a wide selection of software tools 15s 
provided as an integrated packager, with a consistent and 
relatively concise command structure. Typically, a means is 
provided to allow the programmer to work within the language 
being used for the program, without having to descend to the 
object language level to perform any of the functions 
necessary to create, modify, or test the program. 

As a concrete example, the reader's attention iS drawn 
to the most widely=known integrated programming environment, 
the APL system (iverson, 1962). When using this System, the 
programmer 1S able to perform all steps in the program 
development process without ever having to issue explicit 
commands to the host operating system. The APL environment 
itself provides an integrated set of facilities for storing, 


editing, and dedugging modules which are arranged in 





workspaces and libraries, access to which 1S available using 
commands that are part o f the APL language definition 
itself. In additions so far as the user 1s concerned, there 
is no notion of translating, linkings or loading indiviaual 
functions or programs. fo the orogrammer the System appears 
to be capable of evaluating programs written in APL without 
translations and all of the programmer's interactions with 
the APL proqrams defined occur within the syntactic 
framework of the original source language. 

Other languagesoriented programming environments are 
under development or in user, notably the ECL project at 
Harvard (Wegbreit et. aler 1974], which is based on a LISP= 
like programmina language, and the GANDALF project, 
[Habermann,1979), which is based on the new Department. of 
Defense language, ADA, Both of these projects are designed 
to offer an environment whicn is even more intensively 
Syntaxeoriented than that offered oy APL. In addition, 
these systems incorporate into an integrated environment” a 
wide range of facilities normally provided by the host 
operating system. The two human engineering 1deas 
motivating the design of such systems are to free the 
programmer from ‘the necessity of learning two command 
Structures, and the ability to reference and access parts of 
the modules being developed using the natural structure 
imposed by the syntax of the lanquage in which they are 


written, 





One of the crucial problems which must ve solved in 
implementing such an environment 1s the need to provide more 
or less continual access to the evaluable program structure 
in a syntaxworiented fashion. Conceptuallyrsr the system must 
“understand” the syntactical structure of the program during 
its entire existence, not simoly during the phase 1N which 
it is entered into the system. Thus, the internal structure 
of the program must be sufficiently complex to reflect the 
Syntax of the program at al) times, and facilities to 
utilize this structure must ode online during the entire 
period of program development. Since such a requirement 
must be met for other reasons, a Syntax=directed editor is 
often offered as the primary means of program entry. Such 
an editor utilizes the on-line knowledge of program 
structure to allow additions, celetions, and modifications 
of the program structure to be made based on the natural 
syntactical units of the program, rather than the more usual 
lineworiented approach. 

Qur research was originally motivated Dy this 
application for Ssyntax=directed editingrs since the program 
access algorithms for the editor are the very routines 
involved 1n Program structure access througnout its Jie wn 
the programming environment. Ne wished to investigate the 
task of generating a syntax=directed editor from a grammar 


descriptions in the hopes that procedures” for routinely 





performing such a task coulda be described in general terms, 
if not altogether automated. Ihe ovelief that a set of 
usable rules could be found was encouraged oy the fact that 
techniques for generating a tunctionally analogous System, a 
parser, from a BNF grammar descriotion are well-understood 
and, in fact, frequently automated. 

The techniques reoorted in this paper are fundamentally 
very simple, but lie in a direction diametrically opposed to 
those involved in parser generation. A parser 1s a 
mechanism for taking a correct word in some language, and 
recreating the syntactical structure inherent mn that word 
from the grammar of the language. Tnat this structure can 
be deduced from what would otherwise be a meaningless string 
of symbols is a consequence of the fact that the programmer 
used a grammar to create it that was equivalent to that used 
by the creator of the parser. The program itself represents 
a sequentialized version o f parallel, hierarchical 
structures, one in the ming of the programmer, and the other 
internal to the computer system. The programmer has encoded 
the structure into the messager, and the parser is the 
mechanism needed to decode it. 

Viewed in this lights the use of a parser-based 
translation system is a very odd solution indeed to the 
problem of entering a program structure into a computer 
System for subsequent execution: it is as if a piano were 


were to be moved it Into a house by tearing it into small 
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pieces, appropriately labelling each one, pushing the pieces 
through a mail slots and relying on an automaton inside the 
house to reassemble the piano. This procedure is 
notoriously errors-oroner, and once accomplished, it is 
extremely difficult for the programmer to gain access in a 
humansoriented way to the actual structure built. Extending 
the simile used above, it is as if we could only confirm 
that the piano had been reconstructed properly by listening 
to the music emanating from the interior of the house after 
the piano had been reassembled: 

Of course, the historical cause for such a solution 1s 
clear: most general-purpose computing systems, at the time 
language translation technology was elaborated, relied 
heavily on sequential, batch*eoriented inout mechanisms such 
as card readers, and were like houses without front doors, 
only mail slots. There was a driving need to invent such 
mechanisms as parsers so that high-level programming could 
be done at all. 

However, with the increased reliance on interactive, 
Pemote~entry time-sharing facilitiess a radically different 
solution to the problem of program entry can be 
Investigated. The program structure can be interactively 
built within the computer in the first place. Sucn a 
solution opviates the need for a parser altogether. 
Instead, the editor and the programmer cooperate to build 


the desired structure directly. The gQrammatical 


- 





specifications of the language are not used indireetiy, to 
build a decoder for an unnecessary representation, dut are 
used simply aS data to guide an appropriate, direct 
syntnesis of a well-structured program representation. 

This thesis describes such mechanisms in enough detail 
to serve as the basis for the implementation of 3 language 
independent program entry system. The System 1S language 
independent in the sense that data corresponding very 
closely to the grammar of a context-free language itself, in 
the form of a finite set of static “transformations”, is 
directly interpreted by the system to form structures well- 
formed under that grammar. If the grammar data is changed, 
the same system supports a new language. 

We have adopted the term “grammar-driven synthesis” to 
describe the function of the systems discussed in this 
paper, in order to suggest the idea that grammars with a 
rich set of operators are utilized as knowledge bases with 
little or no pre=processing. This direct utilization of a 
humanworitented grammar is to be contrasted, for instance, 
with the extensive pre-processing required to derive 
transition tables for driving a shift-reduce parser. 

Chapter II describes in very general terms several basic 
mechanisms for performing such qrammar-driven synthesis, 
relating them to the fundamental idea of verforming a valid 
derivation under a context-free grammar. Chapter III 


provides a further elaboration of these mechanisms, aimed 


ie 





toward the more concrete goal of oetng able not only to 


create, but also to modify and delete parts of a 
hierarchical} program structure, mn a syntactically 
consistent way. Chapter IV, which 1318S something of a 


digression, considers from the viewpoint of database design 
how programs may be represented and accessed as_ databases 
during modification and during storage or transmission from 
one place or time to another. In Chapter Vr, a conceptual 
description is presented of 3a prototype orogramming 
environment, desiaqned to allow the programming lanquage in 
use to be changed by simply changing the language 
description installed in the system. This design is 
concerned solely with the facilities for program 
modification and entry, and 18s based on the assumotion that 
a means for describing in a relatively simple way the 
Semantic content of the program structures to oe outlt can 
be found. Finatly,s, in Chapter VI, the results of the 
research undertaken so far are summarized, and some 


Suggestions for future investigations are made. 
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Il. GRAMMAR@DRIVEN SYNTHESIS 


A. INTRODUCTION 

In this chaoter, several models for grammar-driven 
editors of increasing complexity are described in terms of 
the theory of context-free grammars. Each editor receives 
two sequences of input symbols, the first representing a 
context-free grammar, and the second a series of commands 
which guides the synthesis of a sentential form of the 
grammar initially provided. The described mechanisms) are 
capable of utilizing very general classes of context-free 
grammars, including ambiguous and incomolete grammars as 
well aS grammars with useless productions (1.@., productions 
which do not occur in the derivation sequence for any word 
of the defined language.) For this reasons we adopt the view 
that the fundamental oroduct produced by such a Synthesizer 
iS a sentential forms, possidly containing noneterminal as 
well as terminal symbols. 

The first syntax-directed editor oroduced by the 
research group along the lines outlined in this section was 
written by B. MacLennan in November, 1980 in LISP and called 
"A Universal Syntax=-Directed Editor". The orimary motiva= 
tion for the analysis of grammar=driven synthesis presented 
im this chapter was to perform an exhaustive review of the 


algorithms employed and to connect them to the mathematical 
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theory of contextefree grammars in such a way as to justify 
the adjective “universal”", as well as to provide reasonably 
convincing informal arguments that no critica! loopholes had 
been missed. This technology for using a grammar 1S come 
pareaqd with conventional parsing techniques, and the feasi- 
bility of using such Synthesizers as the foundation of a 
System providing interactive access to a hierarchically 
organized database (such as that representing an executable 


program structure) is discussed. 


B. GRAMMARS AND SENTENTIAL FORMS 
It is assumed that the reader is familiar with the 
Backus-Naur Form, or BNF, notation for mathematical grame 
marse Appendix A contains a formal specification for this 
Notational system. The basic concepts from the theory of 
contextefree grammars used throughout this section are 
adapted from  (Hopcroft and Ullman, 1979). The present sec- 
tion is provided primarily for background and continuity. 
A context-free grammar has the following elements: 
-- A finite set T of terminal symbols, 
“- A finite set N of non-terminal symbols, 
disjoint from T, 
“- A finite set P of productions, each exoressed 
in BNF notation, 
-- A designated target noneterminal t 


included in N. 
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In addition, for the grammar to be context-free, every pro- 
duction must be of the form 
<e ee c= Ag 
where X i$ a string (possibly empty) of terminal and none 
terminal symbols, and ais a noneterminal symbol. The acro- 
nym "CFG" is commonly used to abbreviate the phrase 
"context-free grammar". Throughout this chapter, we wil) 
adopt the convention of using lower-case letters from tne 
beginning of the alphabet to represent noneterminal symbols, 
lower-case letters from the end of the alphabet to represent 
terminal symbols, and upper case letters to represent 
Strinas (possibly empty) of terminals and noneterminals,. 
Since we will be considering only context-free grammars, the 
term “grammar” will always be understood to mean "context= 
free grammar". We shall also assume that all grammars con- 
sidered are non-trivial, that is, that the sets T and P_ are 
non-empty. 
1. Sentential forms. 

The basic intuitive concept underlying the idea of a 
context-free grammar is the notion of derivation: the 
replacement tn a string of a singte noneterminal symbol by 
an equivalent string of terminals and noneterminals as 
specified by some production. 

Let G = { Ty Ne Pe t } be a grammars, and let S(1) 
and S(2) be strings of symbols. (We adopt the notational 


convenience of using parenthesized integers to subscript 
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variable names.) Then we say S(1) derives S(2) 1n one step, 
if S(€1) and S(2) have the form 
S$(1) = XaZ, Se) = XYZ, 
and there exists a production in the set P with the form 
<a> s3= 7. 
In this case, we write 


S(1) => Se). 


In an analogous fashions we may define the notion of 
a leftmost derivation, for which the string X ahove contains 
no noneterminal symbols. 

A string 8 is said to derive a string S’' tn zero or 
more steps, or simply derive a string S's if one of the fol- 
lowina conditions is true: either S$ = S3', or else there 
exists a series of strings S(1), S€2)-, . « « « SN) such 
Geet 5S => S(i)d, SC1) => SC2), « « or SCN) => SS. In this 
case, we write 


S *5> S'. 


A string W 1s Said to be a sentential form of G If 
t *=> W, where et is the target symbol! of G. A sentential 
form with no non=terminal symbols is called a word. fhe set 
of all such words is called the language defined by Ge. Such 
a language is called a contextefree language, or "CFL". 

A grammar 18 Said to be ambiguous if there exists a 
word 1n the language defined by the grammar with two or more 


distinct leftmost derivations. There exist languages 
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defined by a context-free grammar that are inherently ambi- 
guouss: that is-r which cannot be defined Dy an unambiquous 
contextefree grammar. 

2. ARGOT notation. 

Ahile BNF notation iS convenient for theoretical 
manipulations because it incorporates a2 single underlying 
idea, that of replacement in accordance with a oroduction, a 
more powerful notation for practical specification of 
languages is desirable. 

For our purposes, we will adapt a system of notation 
called ARGOT notation, with a concise yet powerful set of 
replacement operators reminiscent of the operators used in 
the theory of ereqular expressions. This notation was 
developed as the core of a pattern=matching programming 
lanquage called ARGOT [MacLennan 1975]. In facts we will 
use a restricted version of this notations, but it is 
convenient to introduce the full notation first ang then 
restrict it as required. A formal deseription of ARGOT 
notation is provided in Appendix A. 

@€e Rules and ARGOT expressions. 

In place of a set of productions, ARGOT uses a 
list of named rules, each of the form: 
names expression. 
Rule names perform the same role in ARGOT notation as non- 
terminal symbols in BNF notations however, it is required 


that each rule have a unique rule name. 
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Terminal] Symbols or strings are denoted oby 
underlinining,s, use of boldface tyne, or enclosure by auote 
marks ("), whichever is appropriate for the typeface avail- 
able. 

The colon corresponds to the BNF metasymbo! 


:=", separating the rule name from the expression denoting 
how an occurrence of that rule name may be expdanded. Rules 
are terminated by periods to separate rules unambiguously. 

The expression half of a rule is an indefinitely 
deep hierarchy of elementary replacement operations and 
subrexpressions, eventually terminating on the deepest leve 
els with terminal strings or rule names. Each operator 
allows a specific replacement operations, which may oe 
thought of as being applied from the shailowest level of the 
hierarchy downward in a nonwdeterministic fashion. Thus, a 
single ARGOT rule corresponds to a number of equivalent BNF 
productions. 

b. Concatenation 

The simplest replacement operator is that of 
concatenations, or replacement of a single construct by a 
Series of sub-constructs. The concatenation operator is 
denoted by simple juxtaposition. Concatenated expressions 
may be grouped into a single construct and used as a suUD- 
expression by means of parentheses. A single BNF production 


expresses the same idea as a simple ARGOT concatenation 
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(except that in ARGOT an “empty"™ rule cannot occur). Thus, 
the BNF production 
<oroaram> ::= program <identifier> <block> . 
1s equivalent to the ARGOT rule 
programs "program" identifier block "." . 
The occurrence of a rule name means that that position in 
the sequence 1s to be expanded as defined by the named rule, 


while the occurrence of a terminal string means that that 


position tn the sequence is to be filled by the quoted 


string. 
Ce Optional constructs. 
An optional subexpression 318 surrounded by 
brackets. The meaning of this ooerator 15 that at the 


specified points, the indicated sub-exoression may either be 
placed into the symbol! string or omitted. Thusr the rule 
Statement: [{ label ] action. 
allows replacement of “statement"™ by either “label action" 
or by “action”. 
de. Alternation Operators. 

Two alternation operators are provided, simple 
and optional alternation. Simple alternation is denoted by 
means of a list of subexpressions separated by vertical 
Strokes and surrounded by curly brackets. The construct may 
be expanded by choosing one of the Ssub-constructs as the 
replacement. ThuSs by the rule 


GyOute ne” fe 
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the rule name "digit™ may be reolaced by any one of "0", 
eo, or “2", 

The optional alternation construct 1s denoted in 
the same way as asimple alternation, except that square 
brackets are used instead of curly brackets. [This operator 
allows replacement not only by any of the indicated alterna= 
tives, but also bv the emoty string. For example, the rule: 

SVGMer te i mY 
allows the eule name “sign"“ to be replaced by “+", by "=", 
or to be deleted (replaced by the empty string). 
e. Iteration operators. 
Three iteration operators are provided. The 
required iterations, or simple iterations, is denoted bdv a 
plus siqn followed by a subexpression. This Weonstruct 
allows replacement by one or more instances of the sub- 
expression. Thuser the rule 
integer: +#digit. 
means that an instance of "“inteaer"™ can be replaced opy 
meat» by "digit digit", by “digit digit digit", etc. 
Optional iteration, denoted oy the asterisk fol- 
lowed by a subexpression, imolies that the construct can be 
replaced by zero or more instances of the suhsexpression. 
Thus, the rule 
asthimge * 2 «6 
allows expansion of the rule name “astring”™ to the empty 


manag, or to any of the strings "a", “aa", "aaa", etc. 
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The final form of iterations, list iterations 1s 
denoted by Surrounding two subsexpressions with a sharo sign 
on the left and three oeriodgs on the riant. [It allows 
replacement by one or more instances of the first sub-= 
expression, separated by instances of the second sub- 
expression. Thus, the rule 

Wists 4 atCOM "G0 see 
allows replacement of the rule name “list™ by “atom", “atom, 
atom", “atom, atom, atom", etc. 
f. Properties of the ARGOT notation. 

The most important feature of the notation 15S, 
that although it is richer in operators and in this sense 
more expressive than BNF notations, it iS not more powerful. 
A language is context-free if, and only ifs it 18 expressio= 
ble as a finite set of ARGOT rules. This can be shown by 
reducing ARGOT to BNF notations that iss by Providing algo- 
rithms for transforming any finite set of context-free BNF 
productions to an equivalent set of ARGOT rules, and vice- 
versa. This constructive proof is straightforward and unin= 
formative, as the desired transformations are fairly evident 
ON an intuitive level. 

As originally defined, the complete ARGOT oro- 
Qramming languages which allows syntactically=keyed computa 
tion as well as input and output parameters to be passed 


between rules, has the full computational power of the 
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lambda calculus (Maclhennan 1975). The notational suoset we 
are here calling "ARGOT notation” does not have the full 
power of the ARGOT language defined in this reference. 

The notation can also be regarded as a generali- 
zation of the notion of a regular expression. wWe may think 
of a set of ARGOT rules as being a set of named reqular 
expressions, and then allow rules to refer to themselves 
directly or indirectly to achieve the power of a context- 
free grammar. This notational similarity allows the simple 
statement of a sufficient (but not necessary) condition for 
the regularity of an ARGOT-defined language. If a finite 
set of ARGOT rules can be arranged in such an order that the 
right*hand side of each rule refers only to rules occurring 
further down the list, the language defined is regular. 
That this 18 $0 can be seen fairly readily. Such an ordere- 
1ng allows replacement of each rule name except for that. of 
the target by the rightehand side of each of the named rules 
iN a terminating sequence. The resulting single rule is 
simoly a regular expression with operators and terminal 
strinas alone on the righthand side. 

This result is of practical user, since if we 
know that a language 18 regular, then we know that simple 
(non-recursive) algorithms exist for orocessing it. The 
algorithms for processing it are considerably less compli-= 
cated than if the lanquage is context-free but not regular, 


1m which case some sort of recursive mechanism is required. 
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3. Restricted ARGOT notation (R-ARGOT). 

The full ARGOT notation, as describedrs, has Mmore 
expressive power than required for the application we are 
interested ins for two reasons: 

“= its indefinitely nested structure reauires recursive 
routines to access the subsexpressions in a rule, and 
“= highly nested expressions are too complicated to ex- 
press easilywrlearned syntax units for the user. 
That the notation allows indefinite nesting 3s imolied obpy 
the fact that the notation itself 18 an tnherently context- 
free language. Since we shall be accessing the grammatical 
descriptions of languages as databases, it 18 highly desire 
able to be able to describe and encode simole, efficient 
access routines. In additions a simpler notation will allow 
usS to conceptualize a given grammar as consisting of a cole 
lection of rules each of which is formatted in one of a fin- 
ite number of ways. 

Nhat we would like is a notation that 1s expressible 
as a regular expression (as is BNF notation) so that it is 
easily orocessed, but retains an adequate amount of expres 
Sive power. These goals are met by appropriately restrict= 
ing the nesting allowed within ARGUT expressions. The 
resulting notation is called R#ARGOT notation (for either 
restricted or regular ARGOT). 

The set of available operators is restricted to con- 


Catenation, required iteration, simple alternation, list 
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iterations, and the optional operator. The other operators 
are rendered superfluous by the nesting restriction. 

R-ARGOT expressions (rule rightwhand sides) may  0»bde 
simple or complex. A simole expression 1S a concatenation 
of one or more terminal strinas, rule names, or optioanal 
rule names. A complex expression 3S an alternation, 
required iteration, or jist iteration. Any sSub-rexpression 
in an alternation or iteration must ve a rule=name. The 
first subexpression in alist operation must be ae rule- 
name. The second may be either a rule*name or terminal 
string. 

The effect of these rules is to limit the number of 
possible formats available for the grammar desiaqner to a 
small set. <Alternations and simple iteration operators wil] 
always be the topmost operator in a given rule expression if 
they occur at alls and the operands”) wil) be simple ruleo 
names in such expressions. The list iteration operator must 
also be topmost, and only the second operand may be other 
than a rule*name, and if $0, must ove a single terminal 
string. Only if the concatenation operator is topmost may 
the operands be alternations, and even in this case no 
further operators are allowed in the rule. 

It is something of a Surprise that such stringent 
restrictions result in grammars that are reasonably well= 
oriented toward human comprehension. The rules that result, 


when they are read informally, seem to express natural 
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syntactic units. It must ove admitted that an improvement in 


human comprehensibility might be attained by allowing one 


level of nesting. However, the simplifications in the 
rule-access algorithms provided by nmaming each sub- 
expression are so striking we have been led to retain Ke 


ARGOT as described here. 

The languages defined in Anpendices A and BB are 
defined using the ReARGOT notation. In particular, the 
reader's attention is drawn to Appendix Sr which contains a 
grammar for the PASCAL programming language. Most of the 
Syntactic rules can be seen to correspond to natural syntac- 
tic constructs within the language tn a way that BNF produce 
tions do not. 

One irritation encountered in the use of R-ARGOT is 
the implicit requirement to rename terminal strings which 
carry semantic information (that 18, that occur as alterna= 
tives within an alternation). Where we would like to write, 
for instance, rules such as 

string: + character. 

Sioa cr Giant a 7) Oo | « we © + "2" }. 


we must instead write 
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string: + character. 


enanaecuen: (meas Digi « « 1 @ }. 


To avoid the necessity to provide a large number of trivial 
rules renaming tokensS, we shall assume the existence of a 
facility in the system for escaping from the normal mode of 
gQrammar-driven synthesis to predefined lexical synthesizers. 
Such a facility is analogous to the Separation of the 
analysis task between the parser and scanner in a conven= 
tional compiler. Thus, we will assume that predefined rules 
exist with such names as "identifier", "integer", "string", 
etc. In the system to be implemented, these rule names 
correspond to predefined input scanners and parsers avail= 


able to the language implementer. 


C. A SIMPLE GRAMMAR-DRIVEN STRING EDITOR 

In this sections a simple mechanism is described capable 
of generating sentential forms from an input grammar in 38NF 
notation. This mechanism serves as the fundamental mode] 
for grammaredriven editing uSing interactive production 


selection to direct the course of the synthesis. 
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1. The Basic Mechanism. 

Ne may think of the basic mechanism, which will oe 
hereafter referred to aS a Grammar=Oriven String Editor 
(GDSE), as a multitape Turing Machine with two input tapes, 
Jabeled PHASE! INPUT and PHASE2 INPUT, four internal tapes 
labeled GRAMMAR, BUFFER, CURSOR, and PRODUCTION, and an outs 
put tape labeled OUTPUT. The PHASE! INPUT tape contains a 
contextefree BNF arammarr which is stored internally on the 
GRAMMAR tape. The PHASE2 INPUT tape contains a series of 
editing commands which will be more fully described shortly. 
The BUFFER tape is used as a work area to synthesize a sen- 
tential form. The CURSOR and PRODUCTION tapes are used to 
hold indefinitely large integers which number the none- 
terminal in the BUFFER currently being expandeds, and the 
Production being applied from the GRAMMAR tape, respec- 
tively. The OUTPUT tape is provided simply as a conceptual 
convenience: it is used to model the transfer of the final 
form produced to secondary storage. 

The operation of the mechanism is as follows: 

@e Phase One == Copy and Check Grammar. 

The PHASE! INPUT tape is copied onto the GRAMMAR 
tape. As this is done, the contents of the input tape are 
parsed in accordance with the grammar listed in Apoendix A 
for BNF notation. Since this grammar is regular, the inout 
tape can be rejected or accepted as a legitimate context- 


free grammar ina finite number of steps. Without loss of 
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generality, we assume that the first production names the 
target symbol as its leftshand side. 
Ob. Phase Two ee Initialization. 

In phase twor the mechanism 18S used to generate 
sentential forms via valid derivation steps on the BUFFER 
tape. First, the target non=-terminal is copied from the 
first production onto the BUFFER tape. Then the following 
loop is executed. Each cycle corresponds to one step of a 
valid derivation. 

Ce Phase Two == Loop. 

A symbol! is read from the PHASE2 INPUT tape. If 
it is ‘'Q' (Gfor 'Quit'), control is passed to the next step 
beyond the loop. 

If the order to aquit is not received, two 
integers are copied from the PHASE2 INPUT tape. Tnese 
Integers are assumed to encode the relative position in the 
buffer of the next nonsterminal to be replaced, and the pro 
duction in the grammar to be used to replace it. poth of 
the integers must be checked to ve sure that they refer to a 
real noneterminal in the BUFFER and to a real production in 
the GRAMMAR. If they dor the leftshand side of the selected 
production is checked to make sure it 1S the same as the 
selected non=terminal. If any of these checks fail, the 
integers are simply ignored and the loop re-entered from the 
beginning. Otherwise, the indicated replacement is pere- 


formed. In details the mechanism performs the following 
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steps. 


First, an integer (suitably encoded) 1S read 
from PHASE2 INPUT and placed in the CURSOR register. Sup- 
pose this integer is Ne. The N'th nonsterminal symool on the 
BUFFER tape is located. If there 18 Nnoner control is 
returned to the top of the loop. 

Another integer is then read from PHASE2 I[NPUT 
and copied onto the PRODUCTION tape. Suppose it is M. The 
M'th production is located: if there 1S none, control is 
returned to the top of the loop. 

The heads are then moved to the N'th none 
terminal on the BUFFER tapes and the left-hand side of the 
M*'th productions and the two nonsterminals compared. I f 
they are not the same, control is returned to the top of the 
loop. 

If they are the same, the right-hand side of the 
M*th production is used to replace the N’th none=terminal on 
the BUFFER taper moving characters to the right to make room 
for the new symbols as needed. 

Finally» control is returned to the top of the 
loop. 

de Phase e@ == End. 
The BUFFER tape is copied to UUIPUT and the 


machine haltsS-s accepting. 
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@e. Synopsis. 

The algorithm described is nothing more than a 
restatement, 1M somewhat more detailed terms, of the funda- 
mental method for producing some valid sentential form under 
a context-free grammar. Determinism has been introduced by 
using an additional tnput phase, which encodes, as_ the 
derivation proceeds, choices for the next nons=terminal to be 
expanded and the production to be used. Erroneous input 
during this phase is ignored. This simple mechanism cap- 
tures the essential flavor of grammar-cariven synthesis. We 
may note that the contents of the PHASEe INPUT tape may be 
obtained in sequence when they are neededs, and are never 
re-used. Thuse this input oerocess serves as an entirely 
adeauate model for an interactive process. Throughout the 
remainder of this sections we will assume that the "Phase 
Two User" is able to examine the internal state of the 
machine in order to determine the current state of the syn- 
thesis and decide what to do next. We make this assumption 
to avoid cluttering the mechanism descriptions witn output 
routines, which do not have any impact on the current state 
of the synthesis in any event. 

Cc. Properties of the GDSE. 
The fundamental property possessed by the GDSE is 
that it mever contains an invalid form in the BUFFER. and 
that a PHASE2 INPUT string exists which will cause the 


machine to halt, accepting, with any desired sentential form 
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on the OUTPUT tape. 

In one sense, these assertions are hardly suscepti-=- 
ble to a convincing proofs, since the mechanism is so onvie 
ously related to the notion of valid derivation in the first 
olace that any proof is likely to be less convincing than 
this intuition. The proof can be carried throuqgn based on 
an induction over the number of times the mechanism passes 
throuah the loop. Since the BUFFER contains a valid senten=- 
tial form (the target symbol) when the loop 1s entered the 
first time, and each step in the loop either leaves’ the 
BUFFER unchanged or changes one valida form to anotner by 
expanding a single noneterminal in accordance with a produce 
tion in the inout grammar, the BUFFER contains a valid sen 
tential form whenever the loop is entered. When the ‘'‘'Q' 
symbol is freadr, the last form generated 1s placed on the 
OUTPUT tape prior to acceptance. (The machine may reject if 
the 'Q' symbol is missing). 

Given a desired sentential form, there exists some 
valid derivation sequence, starting with the target symbol, 
such that each derives in one step the next, and the last 18 
the desired form. (There may be more than one such sequence 
of steps). Each step consists of selection of a none 
terminal in the last derivation, and its replacement by the 
right-hand side of some production. Thus, given tne list of 
derivation steps, it is easy to construct a list of pairs of 


Integers for the PHASE2 INPUT tape which will recreate these 
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steps in the BUFFER. Hence for any sentential form, there 
exists a PHASE2 INPUT tape which will cause that form to 
appear in the BUFFER. Aopending a 'Q' on this tape will 
cause the machine to halt, accepting, with the desired form 
on the OUTPUT tape. 

3. Discussion. 

As previously mentioned, although conceptually sime= 
ple, the GDSE is the underlying model for all of our more 
elaborate grammaredriven mechanisms. The GOSE plays a role 
for Qrammarn-driven synthesizers analogous to that olayed by 
a Deterministic Push=Down Automaton (DPDA) for parser-based 
systems. The fundamental simplicity of grammar=driven syn- 
thesizers arises from the fact that this underlying mecnan= 
ism i318 a direct restatement, with determinism incorporated, 
of the very notion of a sequence of steps in a valid derivas 
tion. The resulting simplicity is to be contrasted with the 
much more complicated “set of items” construction required 
to generate the DPDA associated with a grammars, which causes 
the relation between 3a grammar and its parser to be very 
indirect [Aho and Ullman 1977). The GOSE utilizes the gram- 
mar directly to synthesize words, rather than using rit 
indirectly to produce a derivative mechanism able to decode 
words. 

We might note that we have allowed the output of the 
GOSE to be any valid sentential form, not requiring it to be 


composed of strictly terminal symbols. In other words, we 
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are taking as the fundamental entity defined by a grammar, a 
sentential form instead of a word. it 1s easy enough to fix 
up the mechanism so that oefore naltings it checks the 
string in the BUFFER for noneterminals and accepts only if 
there aré none. Our decision not to do so 1s based on tne 
philosophy that additional restrictions should not be intro- 
duced $0 long as the output without them 1s sensiodlie. In 
practical terms, a valid sentential form under a grammar for 
a programming language corresponds to a partially complete, 
yet well-structured program, with the missing parts labeled 
appropriately by non=terminal symbols. I[n facts the ability 
to deal with such “reasonable” partial programs is one of 
the orimary advantages of a programming system based on 
grammare-driven synthesis. 

Retaining this capability yields an even more 
interesting property. No orobdlem develops if the GDSE 
encounters a non=terminal in the righthand sige of some 
production which is undefined. Once this non-terminal is 
copied into the BUFFER it can never be ereplacedr- so once 
this action has been taken a word will] never be derived. 
However, the use of an undefined none=terminal can yield a 
class of sentential forms. In the context of grammars 
defining programming languages, the described situation 
might occur if some subset of the complete grammar for the 
Penget language was in use. The resulting form would be 


meaningul, and lead to a complete proaram, once the complete 
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grammar were defined. 

Thus, we see that the class of grammaredriven = syn- 
thesizers to be described have the ability to deal intelli- 
gently not only with partial programs, but also with 
partially=complete grammars, in a natural way. 

Finallys we note that ambiguous grammars present no 
problem for the GDSE. If the inout grammar is amodiguous, 
this simply means that there is more than one way to gene 
erate at least one sentential form. 

The question that remains to be answered is whether 
gQGrammaredriven synthesizers can be used to synthesize more 
interesting constructs than strings (for instance, some data 
structure encoding the algorithm represented by the worad.). 


In addition, it is desirable to use a more humaneoriented 


input code. In the remainder of this chapter, first the 
command, and then the synthesis capabilities wil] pe 
improved. The resulting mechanisms will inherit the basic 


Properties of the GDOSE, however, which remains our fundamen= 


tal model for grammaredriven synthesis. 


D. AN IMPROVED GRAMMAR-DRIVEN STRING EDITOR 

In this section we improve the Phase Two command mechan- 
ism for the GOSE. The ReARGOT notation is our primary tool 
for doing this. . This notation provides for a concise and 
humaneoriented set of rules as the arammar definition, 


allows automatic expansion of rule names when there is. only 
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one way for expansion to be doner and provides a framework 
for selection of alternative expansion paths based on keying 
the desired alternative by means of a mnemonic Kkeystroke. 
Yet the regularity of the notation allows synthesis. to 
proceed in a Straight=forward, non=recursive fashion, pris 
marily because the contents of the rule can be accessed by a 
finite automaton. These properties are not coincidental, 
Since the desire to achieve them provided the primary 
motivation for restricting the ARGOT notation in the way 
chosen. 
1. Rules and transformations. 

We eventually would like to classify every possible 
rule name replacement according to some finitely-expressible 
scheme. To this end, we distinguish between the terms 
"rule”™ and "transformation". For BNF notation, each produce- 
tion can result in oner ana only one, transformation of a 
noneterminal symbol to a string of symbols. For ARGOUT and 
ReARGOT notations, in contrast, each rule may exopress more 
than one such permissible transformation. The limited neste- 
ing of R=ARGOT operators allows us to list all of the 
transformations allowed for an R-ARGOT grammar in a finite 
list. 

In order to further reduce the set of transforma- 
tions possible, we introduce a especial class of symbols 
which are assumed to be distinct from either rule names or 


terminal stringss which we will call "e=symbols*. They nave 
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the purpose of serving as place markers in a sentential 
forms Indicating points where optional strings formed 
according to a particular transformation may be inserted. 
We will use three classes of such symbols, with the notation 
"o(rule name)", "i(rule name)", and "“ICrule name)". Tne 
characters “o", "i" and "I" will be used to encode the exact 
sort of transformation by which the symbol can be replaced, 
and the rule name argument will allow the mechanism to 
access the symbols in the grammar by which they can be 
replaced. Since their expansion is optionals for output 
ourposes we may think of all of these symbols as represente 
ing the empty string. When the buffer is to be copied to 
Output, these symbols are simoly skipped. 

With this notation in hands’, we examine the four 
sorts of R=ARGOT rules: concatenations, alternations, 
yterations, and list iterations. 

Concatenations involve replacement of the rule name 
Dy a sequence of terminal symbols, rule nameS- and optional 
rule names. These elements must occur in order exactly as 
specified in the rule. Any optional rule names are cone 
verted to the e-symbol "“olrule name)" when they are encoune- 
tered. IJhus, the rule: 

arrayetype: ( packed ) “array” "{(" ranges “J}" “of” type. 
allows replacement of the rule name <array> in the buffer pvy 
o(packed) array [ <ranges> ] of <type> 


(In this section, we shall delimit rule names in the puffer 
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with angle brackets so tnat they cannot be confused with 
terminal strings.) If the symbol "“ol(packedqd)" is never 
replaced, this string would be copied to the output tape 
simoly as 
array { <ranges> ] of <type> 
We see that a concatenation rule explicitly stands for a 
single, invariant transformation. Implicit in the extstence 
of an optional fields however, 18S an additional transforma- 
tion of the form 
o(rule name) => <rule name> 
The use of an e=symbo! has allowed us to express what would 
have been one transformation with an indefinite format, as 
an indefinitely long (but finite) list of transformations, 
each of fixed format. This notational trick will oe further 
used in the next chapter to make the list of transformations 
associated with a grammar even more regular. 
Alternation rules are always of the form: 
names { namel ; named 4; .« « « + Name=n } 
and correspond to n transformations: 
<name> => <namel> 


<name> => <namece> 


<name> => <name=n> 
Iteration rules correspond to two transformations: that per= 
formed when the rule name is first replaced, and that 


corresponding to additional iterations. Thussr a rule of the 
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forms 
name: + namel 
corresponds to the two transformations: 
<name> => <namel> if name ) 


i( name ) => <namel> 1( name ) 


List iteration rules similarly consist of two 
transformations. A rule of the form: 
name; # namel namee ..e-. 
corresponds to the transformations: 
<name> => <namel> 1( name ) 
}(name)=> <name2e> <namel> ji( name ) 
2. Automatic synthesis. 

Having listed all possible transformationsr, we may 
now determine which of them can be performed automatically. 
Given a rule name, the type of rule 1s effectively compute= 
able from the form of the rightwhand side of the rule alone. 
If the rule is an alternations, the user must be consulted in 
order to determine which of the n possible transformations 
1$ required. If the rule is a concatenation, there 1s_ only 
one possible expansion. If the rule is a simple iteration 
or list iterations the initial transformation 18S required 
and should be automatically performed. It may be recalled 
that predefined rule names (such as "“"identifier") are 
allowed in an ReARGOT grammar to symbolize calls to prede- 


fined input scanners. Such rule names do not admit to expan= 
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sion by rule, but must be expanded by referral to the prede- 
fined scanner which may solicit data from the user. Hence, 
predefined rules cannot be automatically exoanded. There is 
one other possibility: the rule name may be undefined. In 
this case, no expansion of any kind is possible. 

Terminal symbols, by definitions, cannot be expanded. 
The ersymools all require user attention so also cannot be 
automatically expanded. 

As a matter of terminologys we may classify symools 
in the buffer as bound, free, or transient. 

Bound symbols are those which admit to no further 
replacement. Thus, in our system undefined rule names and 
terminal symbols are bound. 

Free symbols are those which require a decision as 
to whether or not they are to be replaced at alls or by what 
transformation they are to be replaced. The free sympools 
are thus names for alternation rules and predefined rules, 
as well] as the e-symbols. 

The remaining symbols can be transformed by ones and 
Only one, transformation which is not optional. They 
represent intermediate steps of a required replacement 
Sequence, may be automatically replaced without restricting 
the range of words which can be formed from the sentential 
form currently in the buffers, and thus may be regarded as 
“transient” in the sense that they are retained only until 


they are recognized and reolaced by their equivalent 
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automatically. The transient symbols in the described sys 
tem are names of concatenations, iterations, and list itera- 
tions. 

Since the expansion of transient symvols can only be 
done in one way, at the beginning of each Phase Two loop we 
would like to search the buffer for a transient symbol and 
expand each one foundr continuing this process until there 
all symbols are either free or bound. Unfortunatelysr for 
unrestricted R-ARGOT arammars, there 1S no guarantee that 
this process will terminate. If one can Start with a con- 
Catenation,s, iterations or list iteration rule and reach the 
same rule by applying a sequence of rules not including any 
optional or alternation ruler, the described process may 
never terminate. fherefore, we must restrict the grammar so 
that no such cycles exist. 

Fortunately, the existence or nonwexistence of such 
cycles can be effectively computed given an otherwise syn- 
tactically correct R-ARGOT grammar. This restriction 1s the 
only semantic constraint we place on ReARGOT grammars for 
the remainder of the discussion. The loss in expressive 
Power iS not great. Such cycles correspond to recursive 
expressions with no trivial case in BNFedescribded languages, 
and once entered, derive only forms with non=-terminals and 
never words. 

With this restrictions, which can be enforced ody 


checking the input Qrammar during Phase Oner we now may 
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allow automatic expansion of transient symbols during tne 
beginning of tne Phase Two loop prior to any furtner pro- 
cessing with the understanding that such expansion is to be 
performed until no transient symbols remain.e With tne gram=- 
mar restricted as described, this process must alwayS ter- 
minate. Since the grammar 1s context-free, the order in 
which transient svmbols are expanded 1s of no consequence. 
We will refer to the automatic expansion of all transient 
symbols until none remain as “autoscanning"™ . 

The addition of the autoscanning feature relieves 
the Phase Two user of the burden of having to order expan- 
Sions that are required by tne orammar. The price paid for 
this facility 1s that only those forms can be produced which 
consist entirely of bound and free symodols. In the context 
of a programming language defined by a grammar, the system 
will now synthesize as much of the program as is syntacti- 
cally deductible from the part of the program already created 
by the user. 

As a concrete example, we display the results of 
autoscanning the target symbol for the PASCAL grammar listed 


in Appendix Bs: 


4e 
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program <identifier> ( <identifier> I<filelist> ) , 
o(iabels) 

o(constants) 

o(types) 

o(variables) 

o(subroutines) 

begin 

<statement> 
l(statements) 

end. 

Sie Improved Cursor Control. 

The next improvement to ve described 1s a more use 
ful method of cursor placement. 

From the analysis above, we see that after autoscan= 
ning is performed, the buffer will contain only bound and 
free symbols. By definition, the only symbols requirina 
Phase Two input data for further expansion are free symbols, 
since bound symbols admit to no expansion at all. Tt fole 
lows that the cursor should always rest on a free symool. 
If there are no free symbols, there are no symbols left to 
expand in the ouffer, and the loop may be left, the buffer 
copied to the output taper, and the algorithm terminated. In 
general, however, one or more free symools will ve left in 
the buffer at the end of autoscan. Wwe wlsh to allow the 
user a means to move the cursor between them, and must also 
decide what to do after the symbol indicated by the cursor 
has been expanded. It should be clear that cursor movement 
never has any effect on either the contents of tne  ouffer 


nor on the valid derivations reachable at any point in tne 


Synthesise The first is true simoly because cursor movement 
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leaves the obuffer unchanged, and the second oecause of the 
context-free nature of the expansion operation. 

Accordingly, after autoscanning, if there are any 
free symbols left, we allow the user to move the cursor bvack 
and forth by entering zero or more cursor control symoois 
(represented by “=>" for movement right and by “<=-" for 
movement left). 

The only question remaining 18 how to position the 
cursor initially», and Row to reposition it after a symbodo] 185 
expanded. We assume that after a symbo] 1s expanded, the 
buffer is autoscanned again to remove any new transient sym= 
bols. If the section of the buffer replacing the expanded 
symbol now contains one or more free symbols, the cursor is 
placed at the leftmost such symbol. Otherwise, it is placed 
at the first free symbol in the remaining string of symools. 
If there are none, wraparound takes place and the cursor is 
placed at the first free symbol in the old substring to the 
left. Initially, the cursor is placed at the first free 
Symbol! in the buffer. 

4. Transformation Selection. 

Finally» we address the problem of causing an 
optional transformation to be applieds once the cursor has 
been positioned as desired by the uSer. 

From the discussions above, the cursor must be rest- 
1NG On a free symbol, that is, at either a oredefined rule 


Name or the rule name for an alternations, or at an e=symbol 
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of type or 1 or le To simolify the command language model, 
the entry of a blank is adopted as the uniform means of 
indicating that an exoansion is to take place at the current 
cursor position. If the cursor 1S at a predefined rule 
Name, control is then turned over to the indicated prede- 
fined input scanner. If it 1s at an e=symbol, the appropris- 
ate transformation 1s made, the result autoscanned, and the 
cursor repositioned for another loop through the cycle. 
Finally, if the cursor is at the rule name for an alterna-= 
tion, one of many potential transformations must be 
selected. Another symbol is entered and this is matched to 
keystrokes included in the rule body. 

Thus, we must extend the R-ARGOT notation to allow 
inclusion of the keystroke for each alternative which wil] 
trigger it. An alternation now looks like: 

statement: { ‘a’ assignment 
; ‘a Ffestatement 
: 'w' while=statement 
; 'c* case=statement 
d. 
The symbol ‘a’ will invoke the transformation 
<statement> => <assignment> 
the symbol ‘w’ the transformation 
<statement> => <while=statement> 


aNd So one 
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Extensions to this simple system are easy to imple- 
ment and desirable. In particulars a string of more than 
one character could be allowed 38S xeye. Some work has been 


ue 9 9 nM 


done in allowing a “falle«through" key, symbolized by : ’ 
which invokes the indicated transition upon any symbol which 
does not occur anywhere else in the list of alternative 
kKeySrs and reapplies the entered symbol to the next alterna= 
tive Generated. Such enhancements are not considered 
further in the present work. 

Thus, the only data which must be entered during 
Phase Two are cursor control commandsysy which leave the syn- 
thesized string intact but move the cursor, ana tnvocations 
of transformations, which consist of a single blank, fol- 
lowed by nothing for e=symbol expansions (listSsr Iterations, 
or optional field inclusion), by a contexts depenaent keys=- 
troke for alternative selections and by whatever is needed 
by the appropriate input scanner for such items as identif- 
ers, Numbers, and the like. 

S. Discussion. 

Ne have now enhanced the capabilities of the GDSE on 
the input side to allow string synthesis ariven by a human 
oriented grammar, with a reasonadly supple means of cursor 
control and transformation selection. The resulting mechan- 
1sm still has the desirable properties of the GDSE: it can 
accept virtually any context-free grammar (we have lost 


those which contain irreducible recursions) and generate any 


46 





form derivable under that grammar (some of which are 
automatically expanded). It is also still true that the 
buffer never contains an incorrect sentential form. 

The mechanism that has been descrived in this sec 
tion 18 consideradly simpler than that for a parser genera 
tore This simplicity is the result of allowing interaction 
between the user and the synthesizer during the staae when 
the grammar of the language 1s available to the mechanism. 
User=provided data is available to guide a true top-down 
synthesis of the desired word in the defined language. 

The described system is highly useful in 1tS Own 
righte It could be used, for instance, to prepare programs 
for entry into a conventional system with the guarantee that 
the program was syntactically correct. The compiler used 
would not need the ability to handle syntactic errors (a 
notably difficult design problem). In additions since the 
input grammar is interpreted, the same editor could oe used 
for many different languages. 

We want to do more, however. In the next section, 
we investigate one way to synthesize more complicated data 
Structures using the grammare-driven editor we have described 


in this section. 


E. TREE SYNTHESIS 
So far, all of the mechanisms described synthesize 


strinas. In order to subsume the ideas already developed 
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under the general notion of tree synthesis, we first charace= 
terize strings as a special sort of tree. We then discuss 
the notion of parse treeS,s and generalize it to form the 
more general class of derivation treess of which both string 
trees and parse trees are a special case. Since trees are a 
welleunderstood data structure, we shall not define them 
formally but treat their general oroperties in an intuitive 
fashion. For the remainder of this section we shall assume 
that the algorithms necessary to create ana manipulate gen- 
eralized (multi-children), ordered trees are freely avail-= 
able. Such trees consist of a finite number of nodes-, each 
of which has ae finite numboer of children occuring in an 
ordered sequence. 

In addition to having children, we assume that each node 
may also contain an indefinite amount of symbolic informa- 
tion. In particulars with each node may be associated a 


string called its label. 


Those nodes of a tree with no children are its leaf 
nodes. Since the tree 1s ordered, its leaf noaes may also 
ope ordered into a linear list. we assume that all of the 


nodes of a synthesized tree may ve examined and accessed for 
the information they may contain. 
1. RewInterpretation of tne GUSE. 
In all of the work that follows, we use a syn-= 
thesizer that is formally identical to the GOSE. we shail 


cal! such a mechanism a GOE, for Grammar-Driven Editor. The 
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action taken by those steps in the algorithm that actually 
interact with the BUFFER are reinterpreted as calls to 
tree=manioulation subroutines. The BUFFER 1s now conceived 
to contains, not strings of symbols, out aopropriately imple= 
mented ordereaq trees with labeled nodes. Rather than 
describing the algorithms involved to create, modifyr, and 
traverse such structures in detail, we assume that mathemat- 
ically correct subroutines are available to perform the 
needed functions, since methods for implementing trees using 
a sequentially-addressed, rewritable memory store are well-= 
KNOWN « 

In order to resinterpret the imoroved GOSE as a tree 
Synthesizer in this way, we need routines to initialize the 
BUFFER with a target tree (or initial tree), move the cursor 
back and forth, and replace a "symbol" with a “string of 
symbols" (whatever these terms mean in the new context). 
Alsos, we now need to explicitly identify the precise means 
used to “display” a tree. 

Supposing that appropriate routines are availaole, 
we wiSh to araque that the new mechanism, which synthesizes 
trees, instead of strings, inherits all of the formal pro- 
perties of the originals, in the following sense. 

The display algorithm in use may be thought of as a 
function, d,s, mapping trees into strings. We shall consider 
a tree to be a “sentential form” of the input grammar of 


interest if, and only if, its image is a string which is a 
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sentential form of the grammar. 

Ne wish to compare the operation of the old and the 
new mechanismsSsr given exactly the same stream of input sym 
bols on the PHASE2 INPUT taper, supposing that the grammar 
specifications on the PHASE! INPUT tape are equivalent in 
some as yet unspecified sense. The fundamental property 
that gives the GDSE all of the features that make it an 
appropriate synthesizer for sentential forms is that at each 
entry to the loop, the BUFFER always contains a correct 
form. ThiS property iS a consequence of the fact that the 
manipulations inside the loop either leave the contents of 
the buffer unchangedr, or transform one valid form to 
another. Since the BUFFER is initialized with a valid form, 
by induction the BUFFER never contains anything but a valid 
form upon loop entry. 

We would like the new mechanism to perform the same 
Gerivation steps, given the same PHASE2 input sequence, as 
the old. The display function would then serve as a mor- 
phism from the new mechanism to the olds over the operations 
defined by the possible BUFFER transactions made available 
by the algorithm within its basic loop. Thus, if it is true 
that, for any given cycle through the loop by the parallel 
mechanisms, with identical forms in the two BUFFERS at tne 
beginning of the loop (as viewed under the display function 
for the new mechanism), and that corresponding derivations 


are undertaken within the loop, then for every possible 


50 





derivation sequence that can occur under the old mecnanism 
there will be ones and only one, derivation sequence which 
eccurs under the new mechanisms and the product of the new 
mechanism, when viewed under the display function, will ode 
identical to that of the old. 

The question of paramount interest, iS under what 
circumstances will this propertysr that the contents of bpoth 
BUFFERS will be displaywequivalent § for any step mn 
equivalent machines, be true? 

It is well outside of the scope of our research to 
provide a comolete answer to this question, in the form of a 
set of necessary and sufficient constraints so tnat the 
desired property (which we might cal] “steowise 
equivalence”) 1s true. Rather, we shall provide ai descrip= 
tion mn general terms Of cme aauural sclass “of ire- 
Interpretation constraints that are merely sufficient. 

Im the tmoroved GDSE, the PHASEL INPUT tape cone 
tained a finite set of rules, each of which consisted of a 
finite set of transformations with one symool on the Ileft-= 
hand sider, anda string of symbols on the righthand side. 
In the resinterpreted synthesizer, each transformation will 
consist of a specification calling for the replacement of a 
Single leaf node, labelled with the symbol on the left-hand 
side of the original transformation, with a forest of adja- 
cent siblinas with leaf nodes labelled with each of the sym- 


bolls on the rigqhtshand side. Such a tree transformation 
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specification will be referred to as a template. "Reolace= 
ment of a symbol by ae string" 18 now taken to mean the 
replacement of a labelled leaf node by the forest of adia- 
cent siblings specified by the appropriate template. 

In order to ensure that the structure in the BUFFER 
is always a tree, (since we may allow replacement of a node 
by a forest), it is necessary to ensure that the root node 
in the BUFFER is never broken up into a forest. We there= 
fore impose the constraint on the system that the BUFFER be 
initialized with a tree consisting of a special root node 
with one child, labeled with the target symbol. Since only 
leaf nodes are ever replaced, no replacement ever turns a 
previously internal node into a leaf node (no transforma- 
tions have empty right-hand sides). Since the root node 1s 
initially internal, it is never replaced. Hence tne struc 
ture in the BUFFER is always a bona fide tree. 

The above suppositions are insufficient to obtain 
the stepwise equivalence property by themselves, since we 
have not addressed the display functions, which is used to 
define what 1S meant by a tree which is a valia sentential 
form. 

In the final system to be described, the language 
implementer will be given the power both to select a partic- 
ular template from all of the valid candidate templates 
available, corresponding to the given transformation, and 


also influence the display order of the children of a given 


De 





node. The retention of stepwise equivalence depends jointly 
on the consistent application of this facility, and rt 1s 
our oresent intention to provide a sufficient condition 
which does, in fact, preserve it. 

Selection of a single template for each transforma= 
tion in the original grammar may be thought of as specifying 
a function, mapoing transformations into templates. Let us 
name this function f. 

In the work immediately following, the display algo- 
rithm will be very simple. A tree is displayed by listing 
the labels for ali of its leaf nodes in order. Since the 
rightshand side of templates are ordered forestsSs we may 
also speak consistently of applying d to the template: 
agains, we simply list all of the leaf node labels in order. 
The required constraint is simply this: f and d must be 
inverse functions on the set of transformations in the grame- 
mar and selected templates. That iSr each template must 
display as the transformation to which it corresponds. 
Finallys movement of the cursor back ana forth is to obe 
interpreted as movement of the cursor from leaf node to leaf 
node, as ordered under the display function. 

Under these conditions, stepwise equivalence will be 
retained by the new mechanism. The fundamental] reason for 
this is that the display algorithm defined is, itself, 
“context-free”. If a given tree is a sentential form, 


application of a template to it wil) yield a tree which is 
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also a sentential form. Moreover, the new tree will display 
as the same form as that yielded by the corresponding symbol 
replacement api ‘ed by the string synthesizer. Cursor move- 
ment also takes place in parallel. 

Since the new mechanism 1S stepwise equivalent to 
the old, it inherits all of the formal properties of the 
old. Of courser since the actual contents of the BUFFER may 
be suostantially richer in structure at any given time, the 
New mechanism may have emergent properties of its Own In 
addition to those inherited from the GDSE, but such propere= 
ties can be utilized only by using an additional algorithm 
to access information that has been hidden in internal nodes 
of the tree in the BUFFER. 

A more flexible display algorithm will be used In 
the final system. The implementer will have the power to 
permute the display order of the nodes in a2 template, as 
well as to display strings stored with the rule instead of 
as labels of a node. The display algorithm ‘retains’ the 
basic property of providing a context-free displays, however, 
and the same constraint applies to the display and template 
specifications chosen: each template must, in facts display 
as its corresponding transformation in order for the system 


to maintain stepwise equivalence. 
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ee. strings as Jrees. 

Ne may think of a string as a special sort of tree 
which has a root node and one child for each symbol] in the 
Stringe Such a tworlevel tree we shall call a string tree. 
For instance, the string 

"if <expression> then <statement> olfelse=-part)" 
corresponds to the string tree 
<root> 


if <expression> then <statement> olelse-part) 


In order to synthesize string trees with a DE, we 
initialize the BUFFER with the tree 
<root> 


<target> 


Replacement of a symbol] by a string of symools 15 
redefined as the replacement of a leaf node by apseu lot 
adjacent sibdling nodes, fitted into the place of the 
replaced node in the ordered list of leaf nodes. In other 
words, the template corresponding to a given transformation 
1$ just an ordered forest of single-node trees. 

The resulting GDE, although it aoes synthesize 
trees, constitutes a system that 1S isomorphic to the GDSE. 

3. Parse Jrees. 
The concept of a parse tree occurs frequently in the 


theory of contextefree grammars. 
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We can view parse trees as the structures syn 
thesized by another rerinteroretation of the basic grammar- 
driven synthesizer. The initial tree 1s taken to be _ the 
Same, two node tree as for the case of string trees. Ine 
notion of replacement of a symbol by a string 1S ree 
interpreted as the addition of children to a leaf node, 
labeled with all the symbols of the strina. In other words, 
templates always take the form of a tree, with the root node 
labeled with the leftw*hand side of the transformations, and 
each child labeled with the appropriate symbol from the 
rightehand side. As usual, the “string” in the BUFFER is 
the ordered list of leaf nodes. fhe resulting structure 1s 
considerably richer than that retained in the BUFFER by the 
GDSE, since once a node 1S created, it 18S never removed. 
(More accurately, if it 1S removed while a leaf node, it is 
immediately replaced by a copy of itself.). 

4. Comparison of String Trees and Parse Trees. 

We take the view that string trees and parse trees 
are two special cases of a whole range of trees that can 
represent a particular sentential form. This oboservation 
Can be justified by comparing the properties of the two 
types of trees. A string tree incoroorates the minimum 
amount of historical information concerning the aerivation 
Sequence by which it was produced: just enough for further 
derivation to correctly proceed. As a result, string trees 


are very compact. 
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Parse treess on the other hand, incoroorate avery 
large amount of information concerning the derivation 
sequence by which they were produced: enougn so that the 
entire sequence can be reconstructed (down to the permuta= 
tion of commutative noneterminal selection). AS a result, 
parse trees are very large. As a concrete example, Figure 1! 
in Apoendix H contains both the parse tree for ae trivial 
PASCAL program. 

Our eventual goal 1s to provide’ for grammar-driven 
synthesis of directly evaluable trees of reasonable size. A 
secondary goal is to do this in such a way that the resulte 
ing tree can be displayed as a program in the language in 
which it was created, but can be evaluated without any addi- 
tional syntactical access. 

Neither string trees nor parse trees are suitable 
constructs for achieving these goals. String trees incor= 
porated no structural information and must de reparsed in 
order to access their semantic contents in the correct 
order. (This process may even he impossible if the string 
tree was synthesized under an ambiguous grammar.) Too much 
information nas been discarded at the time of synthesis. 

On the other hand, parse trees are unreasonably 
large. Most of the nodes record syntactical information 


that is semantically content-free. 
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Our task, therefore, is to find a way to reach some 
middie grounds, sSynthesizing trees which contain enough nodes 
to retain the desired control structure, but allowing the 
elimination of nodes which have no semantic content. 

The purpose of the oresent section is not to provide 
a complete description of how this 1s to be gone, but to 
provide a conceptual range of intermediate possibilities. 
It wil?) then be possible to choose the sort of tree to be 
Synthesized to meet a particular requirement intelligently. 
In short,y we wish to introduce some “engineering slack" into 
the formal system. 

This purpose is realized by introducing the notion 
of derivation trees; a general concept of which both parse 
and string trees are a special case. 

- Derivation Trees. 

One way to characterize the structure of a parse 
tree 1S to note that every parent node in the tree derives 
its children in exactly one step. Thus, the relation 
between parents and children in the tree 1s the same as the 
"=>" relationship. 

we consider the set of trees in which each parent 
derives its children in zero or more steps? that is, incor- 
porates the “*=>" relationship. 

Such trees may be constructed from a parse tree in 
the following manners: 


ae Mark the root and leaf nodes. 
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be. Mark zero or more of the remaining nodes. 

ce. Discard each unmarked node. Every time a 
node is discarded, replace it within the 
set of its siblings by all of its children, 
taken now as adjacent siodlings. (This 
9rocedure preserves the relative ancestry 


of all undiscarded nodes.). 


The above procedure assures that every remaining 
node derives its new children in zero or more steps. Ihis 
Can be seen by noting that the hypothesis is true for the 
original parse tree, and that if true for a discarded node 
and its children, is true for the node's parents and its 
children during each application of the third step. Hence, 
it is true for the resulting tree. 

In the procedure just specified, the selection of 
interior nodes to be retained is done non=deterministically. 
It is the specification of the particular agorithm to ope 
used for selecting nodes for retention that we make avail- 
able to the system imolementer aS an engineering choice. 
The two simplest algorithms are to retain all interior 
nodes, in which case parse trees are produced, or to discard 
all interior nodesy in which case string trees are produced. 

The trees produced by the procedure just described 
we cal] generalized derivation trees. Our goals however, is 


not to produce a full parse tree and only then to prune it, 
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but to synthesize a pruned derivation tree directly as we go 
along. 

This desire suggests that we apply a particular syn- 
thesis uniformally, in the sense that for each transforma- 
tion implicit in the R=ARGOT grammar there ove associated 
one, and only one, synthesis action. This suggestion is not 
quite a necessary implication: one could conceive of some 
history or contextedependent algorithm for selecting one of 
several oredefined synthesis actions associated with a 
transformation. Im facts such “intelligent” systems are an 
interesting subject for future research. 

But if the simpler protocol is adopted, we ootain a 
sub-class of derivation treeS,r, which we call derivation 
trees constructed by rule. Both parse trees and string 
trees are also members of this class. Hereafter, the term 
"derivation tree” will be understood in this restricted 
sense. 

The association of oner, and only one template, with 
each transformation 1s very clearly an embodiment of this 
idea. The GDE previously described is thus a mechanism 
Capable of synthesizing any class of uniform derivation 
trees desired for a given grammar in R-ARGOT. 

In essence, the next chapter represents the selec- 
tion of further constraints on the template formats to be 
associated with each type of transformations in such a way 


that our desiqn goals are acheived. The trees produced 
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under the set of orotocols are a particular sort of deriva 
tion tree constructed oy ruler which we shall call hereafter 
abstract Syntax trees. This name 1s adopted from the’ ideas 
contained in {McKeenan 1970) as representing an intermediate 
stage in the translation of some program in which a parse 
tree has had its Syntax*dependent, semantically void inte= 
rior nodes pruned away. 

6. Elimination of Terminal Strings in Derivation Trees. 

An inspection of parse trees such as the one 
displayed in Fiqure 1 suggests three general classes of 
nodes for eliminations: those representing a series of pro- 
duction steps needed to fill a high=level slot with a low= 
level construct (soecalled “empty productions"); those 
encoding options available but not so far taken (e-symbols); 
and those representing keywords and punctuation. 

As the next chapter shows, selection of appropriate 
template protocols allows removal of nodes representing 
empty productions. It is our belief that noges of the 
second type can also be eliminated by aopropriate template 
selection and context-sensitive computation to compute the 
existence of a "virtual™ option. 

Ne now investigate a methodology (for eliminating 
most nodes required to hold terminal strings. 

Ne first make the observation that most such nodes 
are semantically content-free. An examination of the k= 


ARGOT notation will show that terminal symbols can only be 
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adaged to a synthesis 1n one of two ways: by means of a con= 
catenation or listeriteration transformations or by means) of 
a oredefined (Cautoparsed) rule mame expansion. In tne 
second case, the included string may well be meaningful, 
e.ge if it 318 an identifier or the like. In the former 
case, however, Since the required terminal string cannot be 
an optional field, there iS no choice as to whether the 
string can or cannot be included. If such a choice existed, 
it must have been via an earlier option or alternative 
selection, and by the template protocols specified in tne 
next chapter, this selection is already encoded into the 
structure of the tree. There is thus no reason to add a 
node to the tree simply to represent an invariant field. 

On the other hands in order to be usable we must be 
able to display the string as if it were a node in the tree. 
The solution to this Quandary 1S to make provision for com 
puting the location and contents of such virtual fields when 
the need arises. This can be done, provided that J)ist and 
concatenation rule templates always have a single head node 
which can be associated with the specific rule from which 
they were derived in some way (either by inserting a meher= 
ence to the rule into the nodes or computing the rule from 
context). If the contents of the virtual fields associated 
with the rule are then stored with the rule, we can avoid 


repeating these strings throughout the derivation tree. 
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These ideas are more concretely discussed in the 


protocols for template construction in the next chapter. 


F. COMPARISON OF GRAMMARSUTILIZATION TECHNOLOGIES 

It is approoriate at this point to step back and place 
the system of grammar utilization described in this cnapter 
within the range of currently avatlable technologies’ for 
grammar utilization. We shall compare this system with the 
two common parsing techniques: bottomeup and tor=-down pars-= 
1Nde All three of these techniques may bode thougnt of as 
producing as output derivation trees. 

It should be recognized that the tree produced by a 
parser in contemporary translation systems is usually “vir 
tual". The parser emits a series of syntax-directed action 
commands which may be thought of as the sequential represen= 
tation of a postrorder traversal! of a derivation tree. The 
“back end" of the system may be thought of as traversing 
behind the parser, destroying nodes as quickly as tney are 
DuIlt. 

Both of the parsing techniques are designed to proceed 
automatically, that iS, without any human intervention. The 
gQrammare-driven synthesizer, in comparison, 1S inherently 
interactive. This property 31S both an advantage and a 
disadvantage, in that the synthesizer utilizes interaction 
to attain desirable goals, but cannot be implemented without 


interactive devices being availaole. 
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The need for the parserwsoriented techniques to proceed 
automatically places a set of mathematical constraints on 
the grammars usable by such systems. The grammar-driven 
synthesizer is capable of utilizing almost any context-free 
grammar? a capability that allows the language aesigner to 
optimize the grammar selected for realizing some ocrogramming 
language towards a set of semantically natural rules which 
will be easy for the human user to understand. 

The parser=based systems are essentially decoders, 
translating a valid word in the defined language into a more 
complicated, but equivalentr structure. inherent in this 
process is the requirement for the user to use some other 
Systemr such aS a keypunch or text editor, to formulate a 
valid inout word in sequential form: a notoriously error= 
prone and tedious process. In contrast, the grammar-driven 
Synthesizer allows the user to create the desired tree 
Structure directly and with no possibility of syntactic 
error (since such errors are Simply rejected immediately). 

Finally, we note that both parsing techniques synthesize 
the output tree from the bottom up. The grammar-driven syn- 
thesizer follows a true top-down synthesis: thus, the 
partially=complete structure is completely well-structured 
So far as it goes. The system is for this reason well- 
Suited as a base for dealing with partially complete pro- 


grams. 
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Pie eoncePTUAL DESTGN FOR GDE 


A. INTRODUCTION 

In this chaoter a conceptual design for a Grammar 
Directed Editor is develoved within the framework defined in 
Chapter I[I. 

The mathematical model provides aelarge framework in 
which to design a Grammar Directed Editor, subdject to the 
following restrictionss 

1. Grammar rules are limited to the concatenation, 
alternation, iteration, list, predefined, and undefined 
rules in the forms specified by the ReARGOT notation. 

ec. The templates associated with these grammar rules 
may consist of arbitrary forests of siblinas, the leaves of 
which must be labelled in accordance with the transforma- 
tions summarized in Figure ed. 

5. The templates for list and concatenation rules which 
include terminal symbols must create head nodes which retain 
or refer to those terminal symbols for display. 

A Grammar Directed Editor constructed in accordance 
with these restrictions will produce a derivation tree whose 
leaves and terminal symbols, retained in head nodes, are 
disolayable as a valid derivation of the inout arammar. 

The following design restrictions and goals serve as a 


basis for limiting the very general nature of the possible 
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templates to a set of generic templates which define’ the 
permissible transformations available for the construction 
of an Abstract Syntax Tree (AST): 

1. The AST should contain the minimum number of nodes 
consistent with the retention of all necessary semantic and 
schematic information. 

eC. The structure of the AST should admit efficient 
editing algorithms, in particular for apoend, delete, and 
insert functions. 

3. The AST should not only be an evaluable structure, 
but further it should require no “preprocessing” between 
editing and evaluation operations. 

4. The generic transformation template structure should 
be such that the creation of specific templates for a given 
Qrammar can be automated over the simplest possible ititnout 
data, perhaos as simple as a grammar itn a suitable notation. 

The methodoloay employed in the design process described 
in the following section 18 to apoly, working within the 
constraints which the mathematical model suagestSs, such 
further constraints and definitions as may be necessary to 
develop generic templates for each transformation which 
realize the design goals. In section Cy, a method for 
displaying the AST is developed which 1s consistent with the 
generic templates as well as with the requirement that the 
valid derivation which the AST reoresents ve displayable as 


SiC e Section 0 $$ introduces the notion of a Lanquage 
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Definition, wherein an R-ARGOT grammar is translated into an 
ordered collection of transformation templates and display 
schemas which serves as the basis for the construction and 


disolay of an AST. 


B. TRANSFORMATIONS 
1. Qoerators and Rulenames 
Figure ec is the result of precisely defining tne 
leaves oroduced by each of the transformations defined in 
Chaoter ITI. 
A simole change in notation produces Figure “3, 
wherein every rulename in a transformation is associated 


with an ooerator to form a tworoart labels as follows: 


<r> = NT,r 
copt(r) = COPT,r 
iopt(r) = [OPTer 
lopt(r) = LOPT,r 
pdf(p) = POF (p)+p 


where f 1S any grammar rulename and p is any predefined 
rulename. The first part of a labels the operator, wil] 
gQuide future transformations. The second part, the 
rulename, serves as a reference to that section of the 
langquage=specific data base containing the information 
required for opoerforming transformations or display. In 
other words, labels may be thouaht of as a2 self-modifying 


"program" for the Grammar Directed Editor stored in the 
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hierarchical! AST structure by orevious versions of the prose 
gram, encoding all of the information necessary for suose- 
quent modifications or display of the structure. 

Note that as a result of the notational convention 
adopted here that the set of possible labels is finite over 
a finite set of grammar rules and, therefore, the set of 
temolates required for such a grammar 18S also finite. 
aincher, the tyoe of transformation which may be applied to 
a given node is determined entirely by the operator and rule 
type association stored within that node. 

The alternation and oredefined transformations 
present a problem, however: although the "NT" opcode is 
usually stored in transient nodes, these two particular 
transformations must be stored in free nodes. The alterna- 
tion requires that the user select one of the _ possible 
alternatives, and the predefined functions require that the 
user input a string which they then process. This  adrregus: 
larity 1S resolved by the introduction of two new operators 
ALT and TERM and the following pairs of transformations: 

NT;,a => ALT,a 

ALT,a => { NTeri ¢ eos ¢| NTornr } 

NT;,0 => TERM,0 

TERM,p => POF(p),0p 
The operators "ALT" and “TERM" may be thought of as logie- 
Cally equivalent to "NT", but as exonlicitly labelling (for 


display ourposes) the nodes” as free (for synthesis 
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pureoses). Figure 4 reflects these modifications to tne 
general transformation table. 

The introduction of the two new labels ALT,s,a and 
TERM,0, while not altering the leaves produced by the origi-s= 
nal transformations and thus not violating the validity of 
the mathematical model's results to systems based on this 
extension, Orovide the following henefits: 

ae The format for the five defined types of tem- 
olate sets iS more regular. At least two transformations 
are associated with each rule type. The first of these 
transformations iSr 1m every cas@, a required transforma- 
tion. The second and following transformations require some 
form of interaction with the user. 

O. Every node whose label has an "NT" operator may 
be automatically exoanded during the autoscan process. 
Thus, after autoscan, the only leaves whose labels contain 
the "NT" operator will be those corresponding to undefined 
rules. 

Ce Since for every unique label there iS one and 
only one transformation vossibler no contextual information 
need be extracted from the AST in order to select and per- 
form the correct transformation. This simplifies the tasks 
both of lanquage implementation as well as AST formation 
since production and invokation of a transformation template 


1S independent of any AST contextual considerations. 


69 





ce. Transformation Restrictions 

The transformations as discussed so far define only 
the leaves of ae possible forest of siblinas which are to 
replace a particular node of the AST. ie now turn our 
attention to desianina the interior structure, if any, of 
the forests generated by the transformation templates. In 
the absence of other design goals or restrictions, the drive 
ing motivation in determining the forest structure 1S to 
obtain as much simplicity and economy of space as possible. 
These goals must be balanced with the necessity to retain 
semantic or schematic information to preserve the valid 
derivation property, as well as to retain sufficient struc- 
tural information $0 that insertion and deletion editing 
functions may be convenient for the user as well) as effie 
cient algorithmically. The requirement to be able to delete 
Synthesized subtrees turns out to constrain the template 
Structures such that the other goals are also met. 

In order to recover gracefully from erroneously con- 
structed portions of the AST, the user should have the capa- 
bility to delete any node in the AST, which, as for. any 
hierarchical structure, inevitably involves the ability to 
delete any subtree. The valid derivation oroperty of the 
AST requires that deletion of a subtree from an AST be real@- 
1zed as the replacement of the entire subtree by a node 
which can validly derive that subtree and which also forms a 


valid derivation with the remainder of the AST. The choice 
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of the transformation to be apolied to a node in the AST is 
based solely on the information contained tn the node itself 
and is completely independent of the node's context. There= 
fore, deletion of a subtree must be equivalent to replace=- 
ment of that subtree by a node with the same label, that is, 
the same operator and rulename, which the node which was 
expanded to form the deleted subtree contained when the node 
was oriqinally created. The constraints orovided by the 
abstract mode! of Chapter II are not sufficient to guarantee 
that this can be consistently and efficiently accomplished. 
For example, consider a grammar which has only concatenation 
rules, each of which is entirely either nonterminal symools 
or terminal symbols. Since the model allows the definition 
of templates for concatenation rules which have no terminal 
symbols without a head node, the tree derived from such a 
gQrammar could be a string tree, containing no information 
for reconstructing a node being considered for deletion. 
The only action possible for a deletion algorithm in this 
case would be to delete the entire tree. However, consider 
the effect of the following proposed restrictions: 

ae All immediate children of a (necessarily oound) 
node must be created by the transformations of the rule oy 
which their father was bound. 

be When a node iS bounds, the rule whose transforma- 


tion bound the node is permanently recorded in the node. 
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ce. A given transformation may generate two or more 
childless stblinas, or a subtree of the current node, but 
not both. 

de If a subtree is created by a transformation, it 
is limited to at most a single generation of children and 
may consist of a single node. 

Given these restrictions, the rule (and therefore, 
at worst, a choice between two transformation temolates) 
which originally created any given node in the AST can be 
identified by examining its father. Computation on tne 
father rule temolates allows retrieval of the unique node 
from which the subtree to be deleted was formed. This 
uniqueness is further discussed pelow. 

3. Transformation Templates 

Given the restrictions developed in the previous 
Section, we are prepared to define the forests produced py 
each of the eleven transformations. The notation utilized 
in the transformation templates below is defined in Appendix 
on 

ae Concatenation 
Rules 


fees ki xe eee XN ’ Ce CS 4 ork | OMe on ee, tk } 


Ue 





Templates 
headop,e ( #4 { NT,rk ft xX kiitoeeek 
TecOlieark yt xko= "“ftek°)” } 
wae) It for some ukis 
NT,¢c => Ck =oWeree s,s) tL rk)” } 
headoprc littoral! kK, xka inet 


headoo = { HEAD { predefined function } 


There are six cases to be considered in the 


transformation to be applied to the label NT,c: 


nonterminals terminals comment 
Case 1; 0 NO undefined rule 
Case e: ! NO useless production 
Case 3:3 > 1 NO head reauired by delete 
Case 43 0 YES terminals only 
Case 5: 1 Yes head required by nodel 
Case 6: >1 YES nead required by model 


Case 1 corresponds to the undefined rule wherein 
no righthand side of the rule exists. Fhe undefined rule 
transformation is discussed below. 

In cases 3, 54 and 6 it is required that a head 
node be created, in cases 5 and 6 by the mathematical mode} 
for the retention of terminal information and in all cases 
by the restrictions defined for the deletion algorithms. In 
each case the head node replaces the nonterminal under 


transformation and the nonterminal and/or optional children 


73 





are realized as the immediate children of the head node. 

In case 4 a head node retaining the termina) 
information reolaces the nonterminal being transformed. 
Since there are no nonterminals in the grammar rule for 
which this form of this transformation is utilized, no chil- 
dren are created. Note that this node 1s bound since it 1s 
transformed into a node which is not one of the label forms 
for which transformations are defined: in fact, this 1s the 
only bound leaf node form generated outside the realm of 
predefined functions. 

Case 2 is the useless production. we could, 
without violatina any of the restrictions thus far imposed, 
define this case of this transformation as a single node 
replacement, ie@er as NF,ce => NTIogre thus avoiding the crea- 
tion of a head node carrying no information. However, we 
see the useless production as a very rare and usually 
unnecessary occurrence which does not justify the increased 
algorithmic complexity required for its detection. There= 
fore, it is treated in the same manner as cases 3, Sr and 6. 
Implicit Template: 


COPT,er => NT ,e 


This ltabel must be accompanied by some _ form of 
user attention in order that the transformation be invoked, 
the nature of which is discussed mn the next section. 


Assuming for the moment that the user has elected to take 
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the ootion, the transformation applied is a sinqle node 
replacement wherein the operator COPT is overwritten with 
NT, and the rulename remains unchanged. 

Note that the rulename in the COPT label may 0be 
any of the six rule types, ineluding undefined, which raises 
the question of where to store the template for this 
transformation. The solution is to make this transformation 
imolicit, that is, to apply the transformation without. an 
explicit template being stored in the grammatical data base. 
This may be done since the transformation 18 invariant over 
all rules in any grammar, depending only on the requisite 
user attention and the COPT operator. 

Db. Alternation 
Rules 
Peemoerr! ye re «7 fw. ie en ">" 
Template 13 


NT,a => ALTea. 


The transformation for the label NT,a is a sin= 
gle node replacement; the operator NT is replaced with ALT, 
and the rulename remains unchanged. 

Template 2: 
NT,rk if user tnout valid 
ALT,a => 
ALT,a otherwise 
This label must be accompanied by user Inout 


indicating which of the alternatives is desired: suppose for 


the moment it is the kth. The transformation aoplied is a 
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single node renlacement wherein the operator ALT becomes NT 
and the alternation rulename is overwritten with the 
rulename of the kth alternative. If the user input does not 
correspond to any of the alternatives, the transformation 
returns the node unchanged. 
ce Iteration 

Rules 

_— +" 
Template 1: 


NT, 1 =Pemeep iC Nise: > LOPR}s1 ) 


While not required by the mathematical model, a 
head node is created by the transformation for the label 
NT,i to fulfil] the deletion requirements. The two leaves 
specified by the model are formed as the immediate children 
of the head node in which the ooerator NI was replaced oby 
ITER. A side effect of the invariant creation of a head 
node is thats while inconsistent with the model, terminal 
information apolicable to every real child in the iteration 
sibling strinas as opposed to the trailing IOPT childs, could 
be included in the iteration rule if an appropriate exten- 
SION were made to the ReARGOT notation. 

Temolate 2: 


IOPT,i => NT-r ¢ ITOPT,i 


Triggered by the appropriate user input, the 


transformation for the label IOPT,i replaces the node with a 
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pair of siolings which are the leaves required by the model, 
Note that the rulename in the I0PT Jlabel is the same 
rulename which bound its father. fhus, all children of the 
ITER node, whether formed when the ITER node was bound or 
subsequently when the IOPT node was expanded, are formed by 
one of the transformations under the rulename stored in the 


ITER node, as required. 


de List 
Rules 
me 8” Pt lx MS , Vi=e@eniecmy es 1 ce} ¢ t. } 
Template 1: 
NT,1 => LIST,1 € NT-rl ¢ LOPT,1 ) 


The transformation for the label NT, 1 replaces 
the operator NT with the operator LIST, forming a head node 
aS required by the model in the case the second right=hand- 
Side argument of the grammar rule is a nonterminal and in 
every case bv the deletion requirements. The required 
leaves form a Sibling strina under the LIST node. 


Template e: 


NT,r2 ; NT,rl |; LOPT,1 Via = re 
LOPT,} => COR Pas, eNIi,grl > LOPT,;1 if x = “{"re") * 
NT,rl ¢ LOPT,} if x St 


The transformation for this label has three 
forms, as indicated, for the three possible cases. In all 


Cases, the LOPT node being transformed is replaced with a 
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sibling strina as shown, the nodes of which are the required 
leaves. As in the IOPT transformations, the LOPT label care 
ries the same rulename as its father so that all children 
created under a LIST head node are derived from a common 
parent rule. 
e. Predefined 

Rules 

pes pdf 
Temolate 1: 


NT,0 => TERM,p 


The transformation for the label NT,p 18 a sino 
gle node replacement, the NI operator being overwritten with 
TERM and the rulename remaining unchanged. 


Temolate 2: 


POF (p,string)sp if PDF (p,string) valid 
TERM;,09 => 
TERM; 9 otherwise 
The label TERM,o must be accompanied Oy 


appropriate user input before the transformation is applied. 
The exact nature of the transformation applied is dependent 
upon the predefined rulename, but certain characteristics of 
the transformation may be generalized. The transformation 
results in either a single node replacement or a possibly 
many*leveled subtree; it may not aenerate siblinas or a 


forest of siblings. As regards the deletion restrictions, 
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the subtree created by a predefined function is considered a 
single unit for editing purposes that 1s not subject to 
internal deletions or insertions. System provided predeo- 
fined rules, if the inout 1S valid, invariably result in a 
bound node or subtree of bound nodes; a free node in the 
subtree would imply knowledae of language-specific arammar 
rules which no general purpose predefined function could 
havee User=supplied oredefined functions, allowable as a 
lanquages=specific extension to the system, may admit such 
free nodes; however, the language implementor is responsible 
for ensuring the syntactic integrity of the AST is preserved 
over such transformations. 

If the input accompanying the label 1S rejected 
by the predefined function, the transformation 1s null and 
the node is unchanged. 

f. Undefined 
Implicit Templates: 


NT yu => NTeu 


The undefined label undergoes a null, implicit 
transformation. 
4. User Attention 
Of the eleven transformations, six define the action 
to be taken for the six possible nonterminal labels. The 
remaining five, the second transformation temolate for each 


of the five defined rule tyoes, all require some form of 
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meer attention orior to the apolication of the specified 
template. The form of user attention required 1s dependent 
upon the operator but generally may be characterized as con- 
sisting of two parts: an indication that the user wishes to 
direct attention to the current noders, and a possibly emoty 
character string utilized by the transformation as an input 
parameter. The five transformations requiring user atten- 
tion fall into three classes, as follows: 
ae I0PT, COPT, LOPT 
The three optional operators require simply that 
the user elect to expand the optional node. Thus directing 
attention to an optional node is sufficient for application 
of the template and the character string parameter jis not 
required. 
SB. ALT 
The Alternation operator requires that the user, 
after directing attention to the alternation nodes provide a 
Character to be utilized in determining which of the possi- 
ble alternatives is desired. 
co TERM 
The TERM operator requires, in addition to the 
user's attentions, a character string for processing by the 
predefined rule associated with the node. 
The exact format of the user attention parameter 
1S implementation dependent, but is summarized abstractly as 


follows, by onerator: 
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ooerator user attention 


cory <elect option> 
ror <elect option> 
PORT <elect option> 
ALT <char> 

TERM <string> 


S. Deletion and Insertion 

Earlier it was asserted that templates defined in 
accordance with an appropriate set of restrictions would 
allow deletion of any subtree from the AST using only the 
rulename of the subtree's parent node. We now verify that 
assertion based on the templates as defined above. 

Of the six rule types, three may ode excluded from 
consideration as potential oarents of nodes to be deleted. 
Undefined rules never form children and thus are never 
referenced for deletion. Predefined rules are defined to 
Create subtrees which can be edited only as complete units. 
Alternation rulenames never appear tn bound nodes of the AST 
since the alternation rulename in a free node 1s overwritten 
with the rulename of the alternative rule chosen. Thus only 
concatenation, iterations and list rules remain as ootential 
parents of subtrees whose deletion is desired. The parent's 
rule type in each of these three cases may be positively 
identified by the parent node's operator: if the operator is 
ITER, the the parent rule is an iteration; if LIST, then it 


1s a list rules and if otherwise (either HEAD or a 
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predefined function), then the parent rule 1s a concatena- 
mon. The templates for these three rule types allow 
recreation of the original label which existed when the root 
node of the subtree to be deleted was initially created. 

A parent concatenation rule, upon initial expansion, 
creates a fixed number of children, all of the forms NT-r 
and COPT,r. By inspection, no transformation or sequence of 
transformations on these labels for anv of the six rule 
types may create additional siblings under the parent con- 
catenation rule nor may they reorder the subtrees initially 
created. Thus the initial fixed number and order of chile 
dren created remains constant. Suppose some subtree, say 
the ithe under the concatenation rule parent is selected for 
deletion. The sibling which was originally created by the 
concatenation rule as its ith child may be reconstructed by 
traversing the concatenation rule template until the ith 
sibling list element is encountered. This sibling list ele= 
ment contains the information by which the node replacing 
the subtree to be deleted may have its operator and rulename 
fields reinitialized. Deletion of a subtree under an itera- 
tion rule parent node is made possible by the consistent 
manner in which the two iteration rule templates create 
children of the parent node. The first child is created by 
the first template and the deletion process for the first 
subtree is similar to concatenation deletion. Subsequent 


subtrees, up to the trailing IOPT+1i1 node, are created by the 
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second template and the information necessary to recreate 
any label mav be retrieved from the first sibling list ele- 
ment of that template. The IOPT,i child is invariant in 
location and form and is not subject to deletion. 

Deletion of the first Subtree under a list rule 
parent 1s handled in the same manner as the first subtree 
under an iteration parent. Subsequent subtrees, up to the 
LOPT,1 node, are also similar to iteration rule sudtrees 
except that they may have been created in pairs. Examina- 
tion of the list rule's second template will reveal whether 
subtrees after the first must be treated in pairs or may be 
handied singly. In either events the information necessary 
to recreate any given child is available in the template. 
The LOPT,1 child is not subject to deletion. 

So far deletion has been concerned only with 
*"unparsing” an incorrectly formed subtree to a single ances- 
tor mode so that the subtree may be correctly reconstructed. 
For subtrees of concatenation rules this is the only form of 
deletion which retains the valid derivation property. Subp 
trees of iteration rules, however, are all derived from the 
same label and thus are all syntactically equivalent when 
viewed from their root. Further, the only restriction on 
the number of iteration rule node subtrees is that there 
must be at least one in addition to the IOPT node. ThuSs-s 
deletion of an iteration rule subtree, excepting throughout 


the trailing IOPT node, could be realized as the actual 
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physical deletion of the entire subtree including the root 
node, as lonq as at least one subtree remains. AS a corol= 
lary, a node oroperly labelled in accordance with the itera- 
tion parent rule could be inserted in front of any node in 
the iteration sibling string without violating the valid 
derivation property. The insertion procedure requires the 
same information as deletion, the rule tyoe and rulename of 
the oarent node, in order to construct an aporopriately 
labelled node for insertion into an existing iteration node 
sibling string. 

List rules whose second argument 1s a terminal sym 
bol form AST structures equivalent to iteration constructs 
and thus ohysical deletion (as opposed to unparsing to a 
single node) as well as insertion are valid operations. 
List rules in general present a more complicated croblem in 
that subtrees after the first are formed 1n pairs. However, 
extending the argument concerning Syntactic equivalence of 
Subtrees to oairs of subtrees is Sstraiqhtforward and allows 
Physical deletion and insertion to apply to Jist rule subd- 
trees as well, 

In summary, deletion is realized as a replacement 
operation for all concatenation rule subtrees and for soli- 
tary iteration and list rule subtrees, wherein the subtree 
to be deleted is replaced by a single node which is a recon=- 
Struction of the subtree's initial state. Under iteration 


and list oarents where other subtrees exist, geletion 
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results in the physical removal of the sudtree or subtree 
pair; reconstruction may be accomolished at the same or some 
other location under the parent by a Separate insertion 


operation. 


C. DISPLAY SCHEMAS 

Thus far a method of constructing -an AST has” been 
developed utilizing transformations to expand nodes. in 
accordance with a set of templates sorted ov. rulename such 
that the AST reoresents a valid derivation of the associated 
Qrammar. Attention is now focused on displaying the AST; In 
particular, a method is developed in this section by which 
the valid derivation of the Grammar which the AST represents 
may be displayed. 

Display of the AST 18S the result of a generalized 
inorder traversal, beginning with the root noder with termio 
nal and nonterminal symbols being displayed in accordance 
with schemas associated with each label. The display need 
not be strictly preorder since provision is made to display 
Subtrees under a parent node in any order as directed by the 
parent's rule schema. This capability is provided to allow 
for the case where the evaluator may have tec access the sub- 
trees in a different order than that implied by the syntax 
of the target lanquage. 

Schemas are referenced by the rulename associated with 


each bound and free node in a manner similar to the 
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referencing of templates so that the display associated with 
a subtree 1S independent of the context of that subtree. 

The valid derivation need not be disolayed tn its 
entirety. For examole, the means is provided to display al! 
undefined nonterminals as they occur in the AST as part of 
the valid derivation. If the language implementor chooses, 
however, he may elect to not display any of the undefined 
nonterminals which appear in a partial grammar he is imple= 
menting in its ftncomplete state. 

In the following two sections, first the schema language 
is defined and then the formation of schemas for each of the 
ruletypes is developed. 

le. Schema Language 

There are three types of display information pro- 
vided for in the schema language: format control, literal 
strings, and subtree indicators. A system for handling com=- 
ments has not yet been developed. However, it is envisioned 
aS an extension to the schema language and not as part of 
the grammar for the target language. 

Format control information is encoded mneumonically 
in the double capital-letter strings "NL", "TB", and "UT", 
Interpreted respectively as "newline", “tab”, and "“untab". 
UT simply causes a variable, "“tabcount", to be decremented. 
TB causes a tab control character to be transmitted to the 
output device and increments “tabcount". NL causes a new- 


line character and "tabcount" tabs to be transmitted to the 
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outout device. Format control information is orovided for 
readability only. 

Literal strings are arbitrary character strings, 
delimited by double quotes, that are transmitted directly to 
the outout device. Literal strings provide the mechanism 
for the display of terminal and nonterminal symbols in the 
derivation represented by the AST. 

A subtree indicator, denoted by a dollar sign fol- 
lowed by an integer interpreted as a child number, directs 
that that subtree be entirely displayed prior to resumption 
of disolay of the current schema. An optional display 
field, consisting of an equals sign followed by a =Jliteral 
strings may accompany the subdtree indicator to provide the 
means for displaying undefined nonterminals, the three 
optionals, and TERM nodes, as described in the following 
paragraphs. 

An undefined nonterminal may apoear for a variety of 
reasons, the most common bDeing as a placeholder in a partial 
Grammar. Since the rule for the nonterminal does not exist, 
there can be no schema, so the optional fields if provided, 
is invariably utilized. If not provided, nothing will be 
displayed for the undefined nonterminal. 

The three optional nodes, COPT, IOPT, and LOPT, 
Fequire special handling since there is nothing inherently 
“optional” about a rule. Rather, the optional nodes are 


Placeholders to indicate to the user the possibility that 
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the rule specified may be invoked, if the user So chooses, 
but also may be left uninvoked in a “complete” AST. Since 
it is the father rule which holds the information that this 
rule invocation may be an as vet unelected option, the 
father rule schema contains the information, tin the form of 
an optional display field, to display the noge accordingly. 

The predefined rule referenced by a TERM node isin 
general a language=independent system routine. As such, it 
has no knowledge of the nonterminal name which its, when 
invoked by the user on a strings, 1S replacing in the valid 
derivation. Since the father rule does have this informae 
tion, the father rule schema contains the optional display 
field necessary to properly displays within the context. of 
the grammar, the rulename which the predefined rule wil! 
replace. In other words, this facility allows the language 
implementor to rename the predefined rule for display pure 
poses. 

When an option has been elected or a TERM node 
Ooredefined rule has produced a bound node, both of which are 
disoplayable in their own rights, the optional field associ 
ated with the subtree indicator 18 no longer necessary and 
will be jianored by the display algorithm. While these nodes 
remain free, however, the optional disolay field provides 
the user the information he needs to expand these nodes, as 
well as a logical symbol under which the GDE may place the 


cursor to indicate the current node. 
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A subtree indicator whicn may reference one of the 
three node tynes discussed above muSt, in order that a valid 
derivation he disolayed, include an appropriate optional 
display field. The imolementor may, of course, omit such a 
display field in which case nothing wil! be disolayed for 
the node. In the case of an undefined nonterminal this may 
be the most pleasing result, in the case of optionals = and 
TERM nodes such a display will not accurately reflect all 
free nodes in the AST that may be of interest to the user. 
The ommisston of such an optional display field may be 
regarded under normal circumstances as a mistake in the 
language definition. 

2. RulesSpecific Schemas 

Construction of schemas is a straight-forward pro- 
cess when keyed to rule=type since the schema subtree india 
cators and literal strings must conform to both the R-ARGOT 
Qrammar rule definition and to the transformation templates 
associated with the rule definition in a consistent way. In 
the schema constructions which follow, format control infore= 
mation is iqnored, but generally may he inserted into a 
schema any place that a terminal symbol is allowed. 

@€@. Concatenation 
Rules 


Ores xi xe eee XN ’ Diwan epee pk |" | tik } 
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Schemas 


exeeees S11 SC «eae SN , 


*t ke” Vat Kk = tk 
Fj="C{rulename) " Tf chiutdes 1S optional 

Sk = $j)2="<rulename>" if child } is predefined 
$j="(Crulename) " if child j 18 undefined 
$j otherwise 


A single schema is required for the concatena- 
tion rule and may be constructeds if all nonterminals are 
realized as children in the order they are listed in the R= 
ARGOT rule, as follows: 

Reading the R-ARGOT concatenation rule from left to 
right, for each symbol xk: 
if xk i$ a terminal symbol, copy it to 
the schema as a literal string, 
if xk is the jth nonterminal and is optional, 
write Sj="C(rulename]”™ to the schema; 
if xk is the jth nonterminal and is predefined, 
write Sj="<rulename>" to the schema; 
if xk is the jth nonterminal and is undefined, 
write Sj="Crulename)"™ to the schema. 
if xk is the jth nonterminal Symbol, and is 
not optional, undefined, or a predefined 


rule, write $j to the schema; 
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inosceoscorithnmetor tie comstruction of a cone 
catenation schema 1s for the displav of the entire valid 
derivation. If disolay of an undefined nonterminal, for 
example, iS not desired, the Subdtree indicator for that 
child could either be written without the optional display 
field or be omitted entirely. While this alaqorithm assumes 
that the implementor wrote the concatenation template such 
that the children correspond in order to tne nonterminals in 
the rule, this need not be the case. The schema must <«kKnow 
the order, however, so that the display is an accurate 
representatton of the derivation obtained from the grammar. 

As an example of each of the possibilities 
listed above, consider the concatenation rule 

simple : "program™ name decis fexterns) biock “end” . 
where the nonterminal "name" refers to a oredefined§ funce 
tion, "“declis* is an undefined nonterminals, and “block” is a 
well defined, non=optional, nonepredefined nonterminal. The 
schema for this rule, without any format control characters, 
would be 
"program"$1i="<namer>"S2="(decis)"53=" lexterns)] "34"end" 


be. Alternation 


Rules 

Pet mcnariexgime,  Charesx2 “}" ... “i" charnixn *}" 
Schemas? 

asi : "“{alternation rulename}” 

ase : “{ charisrulenamel {| 2... !| charnsrulenamen }" 
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Since the transformations defined for an alters 
nation rule are both single node replacements, the second 
one of which results in the alternation rulename being 
overwritten, it is clear that no semantic or schematic 
information required in a sentence i1n the language, as 
opposed to a valid derivation in generals may be associated 
with the schema for an alternation rule since once the 
alternative choice 1s made by the user, the rulename and 
thus access to the schema 1s no longer present in the AST. 
Thus the schema for an alternation rule could have been 
implemented as a subtree indicator optional field. ive 
choose to provide a pair of explicit display schemas associ- 
ated with the alternation rulenames however, to implement” a 
"help" mechanism. The first display schema consists simply 
of a literal string comoosed of the alternation rulename in 
curly brackets and is the schema normally used to display 
the node. The second, optional at user request, iS again 
simply a literal string but with the alternative rules and 
their associated kevstrokes displayed in curly hrackets. 

For example, the following alternation rule 

statement : { atassignment ! ciconditional ! bsblock } 
would be displayed normally by the schema 

"{ statement }" 
ore if the user desired to see the alternatives and their 
keystrokes, by 


*"{ asassigqnment i: ciconditional } biblock }" 
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! 
Ce Iteration 


Rules 

— te 
Schemas: 

isi : $i 


ise 3: “{iteration rulename) " 


The iteration (Cas well as the list) rules differ 
from concatenation in that they may nave an indefinite 
number of children requiring display. Since no terminals 
are allowed in an RARGOT iteration rule and since every 
child is formed independently of the others in the sibling 
strinars, display of an iterations while involving some work 
on the oart of the display algorithm to traverse all of the 
subtrees one at a time, requires a pair of very simole sche- 
mase The first is simoly a subtree indicator used (for 
display of all] subtrees except the last. The subtree indica= 
tor may include an optional field for undefined and prede=- 
fined rule displavs from the transformation template defini- 
tions it is apparent that no child of an iteration node can 
be a concatenation optional node. The second schema is used 
for disolay of the Jast child, invariably an IUPT node. 

d. List 


Rules 
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Schemas: 
msl 6s 6lUS Tt 


$1i$e Vitae = 


Ise 3: $1="(rulenamee) "Se if x wore) 
iten DL i a ae 


1s3 3: “Clist rulename]) " 


The list rule requires three schemas 1n order to 
properly display the unique format the 11st structure con- 
veys. Like the iteration rules, the list may have an tnde-= 
finite mumber of sSubdtreess; however, R=-ARGUT allows the 
second argument to be ae terminal Symbol. Without this 
facility the inclusion of the list rule type 18 hardly juse 
tified since the most usual use of the construct 18 to 
separate grammatical entities with some punctuation mark. 

The first schema is used for display of the 
meeet child. Subsequent children oar pairs of children, 
depending on the specific list rule, up to the last in tne 
Sibling strings are displaved bv the second schema. The 
display algorithm must keep track of which children it has 
Gisplayed in traversing the list in order that this label 
schema structure display the sequence of subtrees correctly. 
The third schema is used for display of the last chila, 
invariably an LOPT node. 

As an example of the list rule schemas, consider 


the ReARGOT rule 
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Statements : # statement , ece 0 
The schemas generated to display this rule would be 

isi 3: $1 

ioe 3 "7" $1 

is3 3: “{statements]" 
Note that a NL format contro!) character would be appropriate 
after the "3" terminal in Ise and before the literal string 
in 1s3 in orager to place each statement and semicolon pair 


on a separate line. 


@e. Predefined 


A predefined display function should accompany 
each predefined rule scanner. The disolay algorithm wil! 
pass the subtree created by the predefined scanner to the 
named display function. For example, the predefined scanner 
"id" will scan an identifier, place it in the symbol table, 
and fill in the TERM node with the information allowing 
reference to that symbo! tahle entry for the evaluator. On 
displayr the routine "“jdout"™ will be called to cause the 


referenced identifier to be displayed. 


D. THE LANGUAGE DEFINITION MODULE 

The Lanauage Definition Module is the grammatical data- 
Dase utilized by the Grammar Directed Editor in the cone 
struction and evaluation of an AST. The Language Definition 
Module has ae fixed and an interchangeable component. The 


fixed comoonent consists of the system oredefined rules. and 
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functions. The interchangeable component, KNOwN aS the 
Language Definitions is comprised of the language~specific 
grammar rules, templatesr and schemas. In addition, the 
Language Definition may optionally include user-sunplied 
predefined rules and functions supplementing or superceding 
those permanently installed in the system. 
1. The Language Definition 

The primary comoonent of the Language Definition 15s 
the internal representation of the language=specific grammar 
as an ordered collection of grammar rules and their associ= 
ated templates and schemas. The Language Definition, apart 
from user-supplied predefined rules and funetionss consists 
of a Rule Tree anda string table. The string table con- 
tains the character String representation of the templates 
and schemas for each rule. The Rule Tree is the ordering 
mechanism for the grammar rules which provides access to the 
templates and schemas in the string table. The Rule Tree is 
a four-tiered hierarchy, the uppermost level of which is a 
head node for the tree. The next level consists of a 
sequence of head nodes, one for each defined grammar rule. 
Under each aqrammar rule node is a pair of head nodes, the 
first for the templates associated with the rule and the 
second for the schemas. The fourths bottom=most tier con- 
Sists of leaf nodes containing pointers to the template and 
schema strings stored in the string table. The regularity 


designed into the temolate and schema definitions for each 
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of the rule types allows accessing any leaf of the Rule Iree 
by the Editor utilizing only the operator and rulename 
information in an AST node tlabel. 

Appendix D is an IntermediaterLevel Language Defini-= 
tion Grammar. Encoded by hand into a Language Definition as 
shown in Appendix E, the ILD Grammar orovides the means to 
generate a Grammar Directed Editor for the construction of 
ASTs representing language=specific Language VDefinitions. 
When such an AST is evaluated by the predefined function 
ILD, the result is a langquage-specific Lanquage Definition 
which may be installed in the Lanquage Definition Module and 
utilized to construct applicationssorienteq ASTs In the 
language defined by the grammar. Appendix F presents a sime= 
ple example of such an applicationssoriented Language Define= 
ition from which ASTs representing strictly formatted 
memoranda may be constructed utilizing the GDE. 

The ILD Grammar allows definition of grammars on an 
assembly=languaqge level, 1.@.e, many details which are come 
putable from the R=-ARGOT grammar rule must be entered oy the 
user. For example, in the construction of an iteration rule 
the user 1S required to enter “rulenamel” and “ierulename”" 
IN a Consistent manner throughout the formation of the teme- 
olates and schemas. However, at this low level the mecnan- 
isms for checking such consistency do not exist. Thus the 
ILD Grammar is seen as a flexible but error-prone tool suit- 


able for use primarily as a bootstrap mechanism for the 
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definition and imolementation of a High-Level!) Language 
Definition Grammar which automatically derives as much 
information from the R-ARGOT rule as is possible. For aram- 
mars in which all nonterminal children of concatenation 
rules are to be created and displayed in the order listed tn 
the rule, an extended R-ARGOT notation which provided the 
facility for inclusion of format control] information and a 
means for soecification of predefined functions as head 
nodes of concatenations would allow such automatic deriva- 
tione Development of such an extended notation as well as 
the corresponding HLD Grammar and function are deferred 
until the symbol table and evaluator desians are complete. 
2. Predefined Rules 

The set of system predefined rules provides the user 
amechanism for entering strings representing simple, common 
constructs, such as identifiers and numbers, as wel) as more 
involved constructs, such as expressions, which even though 
composed of many oarts and perhaps generating multinode sub- 
trees in the AST, may be most conveniently viewed by the 
user as representing single logical units. Predefined rules 
are builttins, ootional extensions to the Language Definition 
which provide the lanquage implementor with a set of primio-= 
tives upon which he may base hHiS grammatical constructs. 
The set of predefined rules is modifiable and extensible by 
the language implementor through inclusion as an adjunct to 


the grammar definition a set of predefined rules which 
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supercede or complement the set permanently installed in the 
Language Definition Module. 

Predefined rules may be viewed as a deviation from 
the orammar directed editing philosoohy espoused througnout 
this work. The use of oredefined rules allows the entry, 
after all, of syntactically incorrect strings which are not 
immediately, in the sense of charactersatsartime immmediacy, 
detected and rejected as invalid. For example, compare a 
"sure", character-atwartime grammar directed editor with a 
predefined rule augmented GDE on the terminal <string>, 
defined for illustration to be the concatenation of any 
characters except a space, and terminated by a carriage 
return. In the pure system, each character is examined = and 
its validity checked as it is tyred. In this examole, if 
the user enters a string of valid characters and then a 
space, he 18s immediately informed that the space is unace 
ceptable and is able to proceed without retyping that pore 
tion of the string thus far entered. The oredefined rule 
system, however, would require that the entire string of 
symbols, including the incorrect spacer be entered before 
rejecting itr and the user would have to retype the 
corrected string in its entirety. 

We grant that grammar directed editing down to the 
smallest indivisible unit, the character, has a certain 
appeal. However, our predefined rule compromise is 


motivated by several advantages and mitigating arquments: 
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ae The time lapse between entering even a large 
predefined rule input string, such as a complex expression, 
and re-entering it if it is rejected as incorrect, is short. 

be The time lost in a predefined rule system in 
retyping the usually short input strings accepted by most 
predefined rules is offset by the time that would be Jost in 
a oure system that requires control] characters to guide the 
tree building via the lanquage definition through the vari- 
ous alternatives involved in the larger grammatical con- 
Structs, such as expressions, that can easily be handled oby 
predefined rules. 

ce The syntactic integrity of the AST is always 
preserved by the system predefined rules since no change to 
the AST is made until the syntactic validity of the entire 
input string is confirmed. 

de. Predefined rules simolify the language 
implementor's task by raising the level of the lowest gram- 
matical constructs that must be defined in the grammar. 
Instead of having to work clear down to the character level, 
predefined rules provide as primitives the facilities for 
handling groups of characters, such as numbers, identifiers, 
and stringss which are the basic building blocks of data 
structures in general and programs in particular. 

@e. Given automatic lexical analyzer and parser gen- 
erators, predefined rules for the class of grammatical cone- 


Structs envisioned are easily built. 
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f. The suitable choice of predefined rules’ frees 
the language imolementor from longewinded, needlessly 
detailed grammatical constructions for a wide variety of 
regularlysexpressible productions. Grammars for lanquage 
definitionss given such a set of easily understandable prim= 
itive constructionss, would be more transparent and easier 
for the user to assimilate. 

It is recognized that taking the predefined rule 
approach to its extreme J|imits could result in a comoilere= 
like editor wherein huge segments are sudmitted for analysis 
to exceedingly complex predefined rules, thereby negating 
the benefits to be gained from a more rational grammar 
directed editing environment. However, within the guides 
lines presented here, the predefined rule approach has’ dis- 
tinct advantages and leaves open avenues for exploration to 
the lanquage implementor. 

3. Predefined Functions 

Nodes in the AST undergoing evaluation faall into 
one of three categories: undefined, head, and function. The 
class of undefined nodes incluces all free nodes which may 
still exist in the AST. Head nodes nodes are the HEAD, ITER 
and LIST operator nodes created for synthesis of the AST, 
all of which are synonymous to the evaluator. Head nodes 
have no computational capabilities during the evaluation 
process but rather provide structure to the AST. Function 


nodes have as their operator one of the predefined 
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functions. Function nodes are generated by concatenation 
and predefined rules during synthesis of the AST and result 
in calls to the corresponding predefined function during 
evaluation. Function nodes may ve leavesr as in nodes which 
reference symbol table entries,s or they may be interior 
nodes. If interior, function nodes must have the number, 
order, and tyne of subtrees expected by the predefined func- 
tion. 

The set of predefined functions defines the range of 
comoutational power available to the evaluator and thus lime 
its the capabilities available to the user of the GDE. A 
proposed set of system predefined functions, based on the 
primitives discussed throughout (Pratt,1975), iS presented 
in Aopendix G. This set of system functions may be aug= 
mented by the language implementor through additional or 
superceding function definitions included as extensions in 


the Lanauage Definition. 
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IV. PROGRAMS AS DATABASES 


A. INTRODUCTION 

The material contained in this chapter was originally 
developed during the search for a solution to a oarticular 
problems: namely, that of storing the tree reoresentation of 
the synthesized program in secondary storaqes, with 
complicated links to other data structures recorded in the 
leaves, in such a way that pointer and reference integrity 
could be maintained. This problem 1S aggravated by the 
consideration that such a Stored structure might well be 
reloaded at atime when the physical contents of shared 
memory spaces currently In use oby the system are quite 
different from the environment existing at the time that the 
tree structure was originally created. 

Once this problem was recognized as being a database 
management problem, to which known techniques of database 
design were applicable, the solution was straiaqntforward. 
The database design techniques described throughout this 
chapter are taken from [Kroenke 1977). The relatively 
unorthodox view of programs as complex databases afforded by 
this insights however, is of more general interest since it 
provides a new perspective on the nature of programming 
Systems. In particular, these considerations provide some 


justification for the hope that grammar-driven tree 
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Synthesizers are capable of building up 3 languaae-= 


independent semantic structure. 


B. PROGRAMS AS COMPLEX RELATIONSHIPS 

In viewing orograms as databases, we first recognize 
that the semantic contents of a program must be accessed by 
two entities: the human reader or writers and the processor 
intended to execute the program. Comments excluded, the 
information available to these two entities 18S almost 
identical: that is, the human user can oredict exactly the 
operation of the processor for a Qqiven programe, and the 
processor deterministically executes the encoded intentions 
of the programmer. So without loss of generality, we may 
initially consider the program as a database accessed by the 
Processor. In the case of a machine language program, the 
processor is the real machine on which the program is to 
execute. For a higher@level languages, the processor is the 
hardware=software combinations or virtual machine, which is 
capable of translating and executing the program. 

The “semantic content” of the program is the collection 
of potential evaluations which the orocessor may be required 
to perform throughout the course of execution. For the 
moment, we disregard the order of execution. Each 
evaluation consists of the selection of one. of many 
Primitive operations which the processor is capable of 


performing, and the application of that chosen primitive 
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operation to a number of arguments, contained in one or more 
registers, or memory locations addressable in some way. 

Uoon reflection, it is clear that ooth the set of 
primitive operations and the set of addressadle memory 
locations are databases in their own right. The keyname, or 
code by which an entry can be uniquely located, for the set 
of primitive operations 1s the operation namer, or opcode, 
ang that for the collection of potential arguments 1s the 
address. 

Clearly, the set of potential evaluations ttS-r in the 
terminology of database theoryr, a complex relationship 
between primitive ooerations and reqisters. A given 
operation may be apolied to many different sets of arguments 
within the course of a program executions, and ae given 
register may be the argument for anumber of different 
operations. There is no funetional relationship between 
items of the two databases in either directions which mneans 
that neither keyname can be used to uniquely identify an 


item in the comolex relationship between them. 


C. DECOMPOSITION OF THE EVALUATION RELATION 

Standard database design techniques specify several ways 
Dy which each of the elements of a complex relationship 
between two databases can be referred to in a systematic and 
unambiguous way durina database access. Two general methoas 


of approach are used. One is to (arbitrarily) force the 


105 





relationshio to be simple (many=torone in one direction 
only), by rejecting from the allowed range of possibdilities 
any memoers of the erelationship which would cause the 
relationshio to be complex. In this caser the keyname_ for 
one of the underlying databases can be used to unambiquously 
refer to members of the relationshio as well. The second 
method is to decompose the relationship into two simple 
relationshios by constructing an intersection database. 

There exist orogramming systems in which the first 
strategy 1S adopted. For instance, if the restriction is 
made that registers may not be rerused, so that at most one, 
and only one, primitive operation is applied to a given 
register, a ourely functional, or norassignment Programming 
System 18 obtained. In such a system, the only named 
semantic elements are functions and canstants (which may be 
regarded as functions). Registers need not be named since 
whenever one is needed, it can be drawn from a poolr, used 
oncer and discarded by the processor. 

This approach 1s considered mathematically elegant, but 
1t 1S Not much in use in non=academic programming systems. 

In the second approach, an intersection database is 
createds consisting of one entry for each distinct memoer of 
the complex relationshio. As a minimums, in order to allow 
reference to the generating databases, each entry in tne 
intersection database must contain the keynames for those 


entries 1m =the original data sets with which it is 
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associated. Thus, for a programming notations each entry in 
the intersection database must contain, at a minimum, an 
opcode and a reaqister address for each araqument, im some 
form. 

The archetypical entry for the intersection database 
corresponding to the evaluation relationship 1s thus: 
OPCODE PODRE OS G@IMEEDORESS( 2°) « «. « ADDRESS(C N ) 
This format is recognizable as the atomic unit of notation 
for most common orogrammina systems, from machine code to 
high level languages. Each single such entry corresponds to 
what i8 normally referred to as an instruction. In summary, 
we assert that a program 1S nothing more than the 
intersection database for instances of the evaluation of 
accessable operands by the primitive operations available to 


the evaluating processor. 


D. CONTROL STRUCTURE 

We have heretofore ignored the question of how the order 
of execution of the evaluations is to be specified within 
the program (the basic elements of which are now seen to be 
entries mn an Intersection database). This order 
corresponds to the logical access sequence of the set of 
instructions. Thus, we may equate the ordinary notion of 
the control structure of a Sprogram, to the database-oriented 
notion of a  loatical access structure for the orogram 


database. The simplest access mechanism for a database 1s 
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to order it as a simple sequence. Under this orotocol, the 
elements of the database wil!) be oresented to the accessing 
entity in a strictly invariant sequence. 

Such an accessing structure 1S realized in such simple 
oroaramming systems as that of a keystroke=programmable 
calculator. <A sequence of keystrokes can be entered and 
automatically reproduced at will, but there 1S no 
possibility of automated branching. 

Such programming systems are fundamentally limited in 
mathematical computational power. The simplest modification 
to such an access regime is to allow conditional oranching, 
so that a part of the instruction sequence may be repeated 
or skipoed, based on the contents of a register at the time 
the oranch is reached. 

Machine and assembly*level programming systems, as wel] 
as such highelevel languages as BASIC and FORTRAN, are 


organized on such a plan. 


E. STRUCTURED PROGRAMMING SYSTEMS 

The disadvantage of a sequential access mechanism is 
that the resulting database does not have local integrity. 
Instruction sequences which may be logically adjacent under 
certain circumstances are not mecessarily physically 
adjacent. This access organization presents No rea} 
disadvantages for the machine processor with a random-access 


architecture, but can be quite confusing for the human 
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orogrammer. To render the proaram datavdase more accessible 
to the user, the notion of structured programming was 
developed. This oraanizational technique consists of 
organizing the access of a Orogram database in 4 
hierarchical (tree-like) manner, so that orogram contro) 
follows a hierarchical program structure which can be 
expressed as a string generated by a contextefree grammar 
(and thus has an associated ohysically hierarchical 
structure induced by the grammar). Such program contro! 
facilities as functions and subroutines were the earliest 
"structured constructs”. The syntax of such languages as 
PASCAL and ALGOL, however, were consciously designed to 
facilitate the exoression of a hierarchical contro] 
Structures, and make the expression of a disordered, 
sequential control structure less attractive than the use of 
"structured” control operators. It is this historical 
development which encourages us to hope that a language= 
Independent semantic tree structure may be built using a 
Qrammaredriven tree editor. Basicallys we note that it has 
become a conscious design principle in the develooment of 
structured programming languages, to ensure that program 
control flow follows the syntactic organization of the 
language. The underlying set of primitive ooerators have a 
Qreat deal in common. Language-dependent primitives can be 
added to the set available to the processor and evaluated 


without regard to the specific syntax by which they are 
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expressed, provided that the overa:) control] structure of 


such additional primitives is also hierarchically organized. 


F. PHYSICAL REPRESENTATION OF A TREE*STRUCTURED PROGRAM 
We are left with the problem of physically representing 
a tree-structured program in a sequentially organized 
physical memory space. The problems encountered are 
precisely those encountered when attempting to implement any 
hierarchically organized intersection set. Thev stem from 
the requirement to refer, directly or indirectliv, to the 
entries in the parent databases from more than one place in 
the intersection database. Two general strategies, each 
with its own advantages and disadvanteges, are currently in 
use In database management systems. 
1. Sequential Tree Representation 
This strategy is implemented by representing the 
tree aS a linear list of nodes and their contents in 
preorder sequence. References to the parent databases are 
embedded in the /Jisting by keyname. The complexity of the 
relationshio implies that each such keyname must be repeated 
many times throughout the list. Special] delimiters are used 
between node listings to indicate whether the next node is a 
child, siblings or uncle of the last. If one of the 
keynames 1S to be changed, a search of the listing must be 
made to find all of its occurrences. A second major 


disadvantage is that in order to access any part of the 





list, the list must be traversed sequentially from the 
beainninge. On the other handr/ no pointers need occur 
anywhere in the liste so that it can be moved about freely 
from one place to another without change. 
2. Linked Representation of Trees 

Trees are represented in this strategy by nodes 
linked together using pointer fields within each nodge. A 
pointer is either the absolute address of the entity pointed 
tor, or an offset or array suoscript which can be used by 
routines in the system to calculate such an address. Tne 
salient feature of a pointer reference is that it allows 
reference by some mechanism which is independent of the 
value of the referenced entity. Thus, the value of the 
entity itself can be chanaed without changing all of the 
references to itr, which are still valid (provided, of 
course, that the change is made without physically moving 
the chanced record.) When the tree itself 1s represented by 
means of nodes linked with pointers, it 18 common to. link 
the leaves of the tree to the parent datahases with pointers 
as well. It is assumed that a neans exists to distinaquisnh 
such external links from the internal links defining tne 
tree structure itself. This representation has as one major 
advantage the ability to be quickly traversed (by following 
pointers). Another major advantaae of this strategy is that 
information in referenced databases need only be recorded 


oncer, and can be changed without updating any pointers. 





Deletion of information is Somewhat more aifficult, but can 
be accomplished by constructing and maintaining cross 
reference lists (inverted lists) which contain pointer 
references to all nodes in the tree referring to a Qiven 
record in the parent database. The primary disadvantage of 
such a representation 1s that the structure cannot be moved 
or stored without a great deal of pointer modification. The 
use of relative pointers 1S an inadequate solutionrs, since 
the consistency of references to the parent databasess which 
need to be moved and managed as separate entitiesr must 
still be maintained. 
3. A Hybrid Strateay for Tree Representation 

An examination of these characteristics indicates 
that the linked representation is preferable when changes 
are to be made to either the parent or tree databases, but 
that the sequential reoresentation is preferable when the 
database is to be transmitted from one location to another, 
or stored unchanged for a relatively long period of time. 
(Storage is equivalent to transmission from one time to 
another, and is thus logically the same problem as that of 
movement.) 

We conclude that the linked representation 1S an 
appropriate representation for the program tree during 
synthesis and evaluation, but that the program tree should 
be moved (for stored on secondary storage) in sequential, 


pointer-free format. Links to the parent databases are 
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converted from pointer references to reference Dy keyname. 
The next section addresses the problem of how conversion 
between the two representations Cans, in general terms, be 


accomplished. 


Ge. PROCEDURAL REPRESENTATION OF DATA 

In order to incorporate these ideas into ae feasible 
design, we consider the facilities that would have to exist 
in such a system. Since the program tree is to be operated 
on in main memory with a linked representation, we may 
assume that a data manipulation package exists which 1s 
capable of synthesizina and maintaining all of the pointers 
required to keen the linked = structures coherent and 
consistent. Consider the process of removing a sequentially 
organized tree structure from secondary storage and loading 
it into internal memory. This process must consist of 
ordering a particular series of function activations with 
particular arguments from the data manioulation package, 
Causing the desired structure to be built within physical 
memory. The sequential representation is seen to be nothing 
but a program for the data manipulation package, which is 
itself a processor with a number of primitive operations. 

Moreover, a strictly sequential control protocol for 
this program is possibles, given a reasonably powerful set of 
primitives in the data manipulation package, since a tree 


can be synthesized in strict pre-order sequence (the parent 





for each child exists at the time of the child's synthesis.) 
ne conclude that the approoriate secondary 
representation for a program tree is as a sequential list of 
instructions, to be translated by some simple interpreter 
into a series of calls to the data manipulation package, 

The offloads or transmit process, consists of a pre-= 
order traversal of the linked representation, emitting the 
acpropriate instructions for recreating the skeleton of the 
tree and filling in the contents of each node as it 1s 
reached. At the same time, references can be removed from 
the appropriate crossereference listss triggering removal of 
the data item from the parent database when aereference 
count of zero 1S reached. During onload, the skeletal 
Structure of the tree 1S recreated, and external references 
in symbolic form reloaded into the apopropriate parent 
database. Pointer and crosstreference list creation and 
maintenance is performed automatically by the preexisting 
data manivoulation packaqe. 

The secondary representation can thus be viewed either 
as datas, representing the tree in linear format, or as a 
program for the data structure manipulation package which 
will cause a logically equivalent tree to ve reconstructed 
in available memory. 

As a beneficial side effect, if the capability is 
installed to allow the onload and offload translators to 


read to or from strings tn main memory, the described system 





provides an easy way to copy or move subtrees, as well as to 
encode treesbuilding templates efficiently. In fact, the 
proposed mechanism becomes the method of choice for any and 
all movement of tree structures from one location or time to 
another, since the data in the transmitted stream 1s 
entirely logical, containing no reference to any 
implementation details. The orocess would even allow 
internal representations to be transmitted from one 
installation to another with a completely different 
imolementation, since all] implementationsdependent data 15s 
removed during the offload process and reinserted during the 


onload process. 


H. SUMMARY 

In this section we have viewed programs as specialized 
databases, and have found that standard database models 
correspond nicely to various programming lJ)anguage styles. 
Two fundamental conclusions nave been reached. The first is 
that it seems very likely that gqrammars-driven tree editors 
can be used to produce trees representing the contro] 
Structures for common programming languages in ae syntaxe 
Independent, directlystevaluable format. This hove is based 
On the direct expression of hierarchical! control structures 
Dy the syntactic hierarchy implicit in the defining grammars 
of current programming languages, and the recognition that a 


smal] set of such control structures provides the common 


ts 





base for current language design. 


The second result MSEeunNe SOc rom: tO a technical 


problems that the aporoortate format for such program trees 
is in linked form when the tree is undergoing modification, 


and aS a sequential, erocedural, pointern-free list of 


instructions when the tree is being stored, or transmitted 


from one point to another. 





Ve. A PROTOTYPE SYSTEM DESIGN 


In this sections, the design for a prototype system 
demonstrating the feasibility of the ideas developed in 
previous chapters is described. Since the implementation of 
the described system is-r at present, incomplete, the design 
1$ presented only in broad outline. A full description of 
the demonstration prototype will be provided as a Technical 
Report when the initial implementation is complete. 

The approach taken is to first describe a complete system 
for 2 gQgrammar-driven, language independent programming 
environment, and then select a subsystem for implementation 
as a prototype feasibility study. The prototype subsystem 
will be used to generate statistics concerning memory” size 
and computational efficiency, as wel! as to refine the user 
interface, with the possibility remaining of extending the 
prototype to a more complete implementation at a future 
time. 

A basic block diagram of the complete system is provided as 


Figure 5S. 


A. SYSTEM MODULES. 


The prooosed system consists of the following modules: 





1. Data Structure Suoport Module. 

This module contains packages of functions, each 
package implementing a specific abstract data type needed Dy 
the remainder of the system. At a minimum, the abstract 
data type packages needed include one supporting an 


indefinite number of indefinitely large association lists, 


eto represent the contents of tree nodes), and one 
supporting general ordered trees, ootimized toward 
reasonably efficient traversal in all directions. In 


addition, the tree support package must include ae facility 
for linking the leaves of trees to other data items, such as 
strinas, symbol table entries, numerical contents, and so 
One Each tree node (internal as well as leaf) must be 
linkable to an association Jist representing the contents of 
the node. 

In addition to supporting tree and association list 
data types, this module is responsible for supporting any 
additional data types for which the need arises and which 
are not supported directly by the languaqe used _ (for 
implementation. (In particular, the implementation 
Currently being developed requires a very primitive string 
table which serves as a rudimentary symbol table.) 

2. Grammar-Driven Environment Module. 

This module provides an editor-like interface for 

the user. Ite translates user commands into apnrropriate 


System actions, which tnclude editing functions, directives 





to evaluate a oarticular program structure, and movement of 
Abstract Syntax Trees from secondary to primary storage and 
pack again. A major component of this module is the 
grammaredriven synthesizer itself. 

3. Memory Management Module. 

This module comprises the actual system Oorimary 
memory itself, which 18 used to store the LD (Language 
Description) and AST (Abstract Syntax Tree) currently in 
USC. In addition, the primary memory module contains the 
data structures peing manipulated by the Data Structure 
Support Module. 

4. File Management Module. 

This module implements a singlew*user workspace on 
secondary storage which contains al! of the LD's available 
to the user, as well as all of the AST'sS which may have been 
oreviously created and saved. These components are stored 
in sequential, pointer=free format as discussed in Chapter 
LV. 

Se. Input/Outout Manifolds. 

These modules manage the system input and output 
Streams, which may be redirected as required by components 
of the system (including the user) to various physical 
devices. The input stream may be taken from the keyboard, a 
file on secondary storage, or a string in primary storage. 
This assignment may be changed dynamically during the 


Operation of the system. Similarly, the output stream may 





be dynamically directed to the CRI, a string in orimary 
Storaae, or to a file on secondary storage. (The term 
*manifold"™ is used to suggest that these functions may be 
thought of as three-position switches, the setting of which 
may be changed at wil! during system operation.) 


6% Onload and Offload Translators. 


These modules, controlling the Data Structure 
Support facilities, convert the sequential data 
representations stored on secondary storage to the linked 


reoresentation needed when an LD or AST is loaded into 
primary memory, and vice versa. As a secondary feature, 
Since the input and output streams may originate or be 
directed to internal strings, these modules can be used to 
"quote”™ or “unquote”™ tree structures, as when a template is 


translated into an actual] subtree replacement. 


B. PRE-EXISTING MONULES. 

The current implementation is being made using the C 
Programming Language on a PODP#11 with the UNIX Operating 
System. (UNIX is a trademark held by 8el11] Laboratories, 


Inc.) This software combination provides a Craccessible 


interface to memory and file management facilities. In 
additions a complete library of string handling and 
input/output functions is available. In consequence, the 


memory and file management modules described above may ope 


thought of as already in existence, for the ourpose of 
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describing the prototype subsystem. In addition, keyboard 
and CRT interfaces are already operational: under the UNIX 
operating system, hardware interfaces are mapped into the 
system as files with conversion routines provided 
transparently. ThuSe for the Input/Output Manifold module 
we need only provide a means of diverting the input) and 


output streams from one file to anothers, or to main memory. 


meee SUBSYSTEM SELECTION, 

Given the broad outline of system module function 
provided above, a minimally capable prototype subsystem can 
be selected for initial tmplementation. Such a subsystem 
must be capable of initialization, synthesis, display and 
storage of an AST in order to demonstrate convincingly the 
feasibility of the concepts outlined in previous chapters. 
Facilities to evaluate (execute), revise, and debug 
previously entered AST's may be deferred, as may the 
facility to easily instal) a new Language Definition. 
Therefore, the capabilities provided by each of the modules 
in the prototype subsystem may be redefined as follows: 

1. Data Structure Support Module. 

Full packages supporting general ordered trees and 
assoctation lists are needed. In additions, a primitive 
Capability to store and reference string values is needed. 
The capability to sueport sophisticated symbol table 


Structures may be deferred to such time as semantic 
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information is needed to allow execution of AST structures. 
2. Grammar=-Driven Environment Module. 

The only major capability required by the prototype 
subsystem is the "“append"™ function, which can be used to 
create AST structures. In addition, a3 working display 
mechanism with simple cursor control facilities 1s needed. 
A frameroriented display mode is satisfactory for the 
prototyoe system (although eventually a screenroriented 
display driver would be desirable). Finally, facilities for 
storing and retrieving AST's to and from secondary storage 
as well as a facility (however cumbersome) for installing 
new language definitions ts needed. 

3. Input/Outout Manifolds. 

These modules need to be implemented in fulls tn 
order that secondary storage may be usedr and in order to 
allow templates existing in primary memory to aepear in the 
input stream for processing by the Onload translator. 

4. Ontload and Offload Translators. 

These comoonents also must be fully implemented for 
the same reason as the Input/Output Manifolds. The 
implementation must be flexible enough so that as =*more 
sophisticated data Structure packages are added, the 
Sequential representation can syntax can be extended to 
accomodate onload and offload of keyfields in the new 


Structures. 


ize 





S. Bootstrap Procedure, 

The system can be initialized as follows. Ne 
currently reaqard Language Definitions as deing written in 
one of three languagess or notational systems: a high-level 
format (which is to consist of R-eARGOT notation with display 
and semantic specification extensions), intermediate-level, 
(the notation developed in Chapter [1]1I), and low-level, (the 
sequentialized, pointer-free reoresentation of an internal 
tree corresponding to the desired LD, using the langquage 
alluded to in Chapter IV.). 

There is no fundamental difference between the 
intermediate and low-level formats, since they represent two 
alternative representations for the same database. 
Translation from one format to the other is performed 
automatically by the onload and offload translators when 
this database is moved to and from secondary storage. 

In order to bootstrap the system, once all of the 
modules have been compiled and linked, it is necessary only 
to perform the job of manually translating an intermediate- 
level description of the intermediate-level language to the 
corresponding low-level description, and instal] the 
resulting text as ae file accesible to the system using a 
conventional editor. 

At this point, the system facilities can be actuated 
to load the file as a language descriotion into system 


primary memory. During the loads, the onload translator wil] 


res 





convert the description into a linked representation of the 
database needed to describe and guide the synthesis of new 
language descriptions in the intermediate format. That is, 
the system itself can now be used to create, aS a grammar- 


driven editor, additional language descriptions. 
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VI. SUMMARY. 


mee CONCLUSIONS. 

In the preceagina chapters, a conceptual foundation for 
the interactive creation of databases, structured 
hierarchically according to a given context*free grammar, 
has been provided. The primary conclusions supported by 
this work are: 

1. <A basic model for the described process is that of a 
valid sentential form generator, rendered determinate by 
allowing for the interactive selection of which production 
to apply and at which point in the alreadyvederived structure 
the selected substitution is to be made. 

2. Notations exist (e.g. the ReARGOT notation) for the 
Specification of general, contextefree grammars which are 
both humansoriented and directly interpretable as the 
knowledge base for such a system. 

53. The basic mechanism correctly interprets ambiguous 
or incomplete grammars, as well as allowing for the 
Synthesis of correctly labeled incomplete derivations. 

4. Analogous mechanisms can be agescribed which derive 
and display not strings, but derivation trees which are 
morphisms of validly derived strings under the specified 


grammar. 
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S. The grammatical notation can be transformed into 
context-sindependent operation codes with arquments which can 
be stored in the leaf nodes of the derived tree tn Such a 
way that subsequent synthesis proceeds correctly, and 
subtree deletion can be efficiently and consistently 
performed without examination of the surrounding context in 
the tree. 

6. The resulting derivation trees can be used to encode 
semantic information in such a way that the trees can be 
evaluated correctly without further reference to the 
Syntactic, as opposed to physical, structure of the tree. 
(This assertion is a speculation, not a firm conclusion.) 

7. A method exists for storing such structures in such 
@way that their consistency does not depend on any external 


data structures save the language definition itself. 


B. WORK IN PROGRESS 

Implementation of the prototype subsystem is currently 
in progress, with no difficulties currently foreseen. The 
only module awaiting final coding and test is the Grammare 
Driven Environment module itselfs and the algorithmic 
Soecification of the functions needed has already been 
accomplished. Provided that no further difficulties are 
encountered, a complete descriotion of the prototype 
subsystem will be later provided as a Naval Postgraduate 


School Technical Report. 


126 





The prototype sudsystem code iS oriented toward a 
demonstration of technical feasibility as opposed to storage 
or execution time efficiency. However, it has been written 
in a highly=modularized manners SO that after 
instrumentation and performance measurementS appropriate 
modifications can be made fairly easily. An attempt has 
been made to provide for the extension of the prototype 
system to a more complete realization of the original system 


design. 


C. FUTURE RESEARCH DIRECTIONS. 
After completion of the prototype subsystems two 
directions are indicated for future investiaation. 
Ll. Extension of the Prototvoe Subsystem. 
ae Symbol Table Implementation. 

A generalized symbol table data type must be 
defined which will adequately support a wide range of 
programming languages. 

be Semantic Action Implementation. 

A elass of primitive operations Cineluding 
access facilities to the defined symbol table structure) 
must be formulateds provision made for tlanquagestimplementer 
definition of additional primitives, ang an AST interpreter 


written. 
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Ce. Pattern=Matching. 

A patternematching facility should oe provided 
as part of the user interface as a sophisticated means of 
cursor control. A fairly simple patternematching 
capability, when combined with the orewexisting capability 
to access the AST in a syntax-oriented way, would allow the 
user to search and access the structure In very 
sophisticated ways, e.g. such commands as “find the next 
occurrence of an assignment to identifier a” could easily be 
formulated. Moreover, when combined with a relatively 
straightforward debug facility, (for exampler setting of 
break=points) a very high-level program test facility could 
be provided. 

d. High Level Language Descriptions. 

The high-level format for both syntactic and 
semantic language specification should be formulated and 
implemented as a more convenient means for imolementing new 
languages. 

e. Debugging Tools. 

Provisions should be made to allow the user to 
set breakpoints, access the current data environments, and 
order steo=by=step execution modes from the editor. 

f. Dynamic Language Changes. 

The feasibility of allowing language changes to 

be made dynamically during AST creation or execution at 


points specifiable in the language aefinition should be 
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investigated. Related to this problem is the provision of a 
facility to link (perhaps dynamically) one AST to another. 
Q9- Increased Storage Efficiency. 

Once basic design parameters, now indefinite, 
(Such as number of orimitive operations) are made final, the 
desirability of packing data fields into AST nodes” rather 
than usIANg the spacesinefficient association list 
implementation, and the resulting impact on timerefficiency, 
should be studied. 

he Full User Interface. 

Deferred edit functions, such as delete and 
insert, should be installed in the Grammar=Driven 
Environment Module. 

ee Additional Apolications for the Technology. 
The conceptual framework orovided by this oaper is 


sufficiently general to support unexpected apnlications in 


areas quite distant from the field of programmina 
environment desiaqn. A few such aoplications are suggested 
belows 


@e Generalized Editing. 
Generalized editors, as described in [Fraser 
1980), are editors which provide for the manipulation and 
display of data structures other than text files. The 
mechanism 1s well-suited for the direct editing of a 


hierarchically organized database of any type. 
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bo. Sparse Programmina Languages. 

Current programming languages are designed with 
a oarser-based imolementations as a fundamental assumption. 
For that reasons, they typically include many keyword = and 
punctuation symbols which are irritating, because 
superfluous, to human users. Because the described 
technology can utilize ambiguous grammars, soarse languages 
with the minimum amount of ounctuation needed for human 
comprehensibility can be described which could be 
imolemented usINg grammaredriven Synthesis as the 
fundamental input mechanism. In fact, improved performance 
from the synthesizer could be expected for such a "“pseudo- 
code"=-like lanquage, since the inherent semantic density of 
the derivation tree could be made very high. 

ce Artificial Intelligence Applications. 

In the described designs considerable pains have 
been taken to provide a simole, uniform method for grammar 
rule and point of application selection, suitable for use by 
@ human operator. There is no fundamental reason why very 
complicated heuristic methods could not be useds however, to 
select the rule to be applied and the place in the current 
Structure the application is to be made. For instance, a 
production system (in the Artificial Intelligence sense) 
could be used to oerform this’ function. The resulting 
hyorid system would have a heuristic front end, and an 


algorithmic back endr, with the desirable property that 
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whatever structure the heuristic front end attempted to 
build, the resulting structure would alwavs de guaranteed to 
be correct in terms of the "“deeo Structure” soecified by the 
lanauage description. Attempts by the heuristic module to 
perform inconsistent modifications would be detected, 
prevented, and reported by the synthesis module. A 
knowledge representation based on such a system would be 
able to interact with the user in very irregular, and 
occasionally incorrect, waySsSs, while preserve a fundamental 


internal database with guaranteed consistency. 
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APPENDIX A, NOTATIONAL SYSTEMS FOR CONTEXT-FREE GRAMMAKS 
i. BACKUS“NAUR FURMAT Cin R@ARGUT) 
context-free-grammar: + production . 
production: non=termina!l i ( right*hana-siade J re ; 
right*hand-sides + construct . 
construct: { terminal i: none*terminal } . 
non=terminal: 


Bee Sth ING 2" 


terminal: “string”. 


we assume that “string” 1S a sequence of any appropriate 
character set not including the metasymbols. 


Note tnat this notation is in itself a regular language. 


Ce ARGOT NOTATION Cin ReARGOT) 
ARGOT: + rule. 
rules rule=name "3s" concatenation. 


concatenation: +tsubp-expression . 


al 


optionaleiteration 
simolewiteration 
listeiteration 
option 

alternation 
optionalealteration 
rule=-name 

terminal 

group 


supwexpressions: 


ay owe we 


ootional-iteration: "x" sub-expression . 
Simple-iteration: "+" suorexpression , 
listeiteration:s “#" suo-expression sub-expression ". ..* «4 


eeeronm:s "~£* concatenation *]) " 
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mmernation: "{" concatenation *;" alternatives cae. 


ob 68 
0 


optionalealternative:” ("concatenation alternatives") " 


" 9 


alternatives: # concatenation 
group: "(" concatenation “)" . 
Semminals ” * * string *™ * " «, 
rulesname: string. 


("strina™ 1s taken to be a predefined rule.) 


53. ReARGOT Cin R-ARGOT) 


R@ARGOT: + rule . 


” ” ” 


rule: rulewname "3 exoression ; ‘* 
concatenation 
rteration 

listwi teration 
alternation 


> +fielda . 


expression: 


J ww we ewer 


concatenatio 
iteration: "+" reulee=name . 
listriteration: "#" rule=name field ". . .”" . 


alternation: "{" rule=name *;" alternatives "}" . 


alternative: # rulesname "1" 2. e« -« ‘ 
fields: { rulesname 

> option 

; terminal 

. 


Semon: “{" rule=name *)* 
memmymnats *~ " " string " * * , 
rule=name: string . 
Note that this notation iSr in itselfr a regular 


language. 
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APPENDIX 8. A GRAMMAR FOR PASCAL 
IN R#ARGOT 


Caer pr 


PASCAL: “program” identifier "(" nameelist 
; olock ina ° 


block: [( labels J] € constants J]( types J € variables ) 
{ subroutines ] “begin” statements “end" : 


labels: “label” integers ", ; 


constants: “constant” cedecls "7" . 
types: "type" tedecls "7" . 
variables: “var” vedecis "7" . 
Subroutines: + s-decl . 

integers: tinteger . 

c=decis: 4 cwedec!i "7" . . .« ° 


c=decl: identifier "=" constant . 
megecis: # t-dec!| “;" . .. : 
tedec!i: identifier "=" type . 


vedecis: # vedec!|i "7" . .. ‘ 


He 


vedecl: name=list : type . 


Q ” 


name=list: # identifier ", Eatin z 


s=dec!: 


ee it 
e 


p-decl: “procedure” identifier { parameters ] 
Blog Kia a,” ss 
re) 


fedec!i: "function" identifier loarameters]) -* identifier,” 


om Moker aa 


parameters: “(" paramelist ")" . 
param“-list: & param-section "3" . . . : 
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param=section: { f-params 
; vw*params 
; prparams 
' 
} 


ce-params 


f-params: "function" namem-list “3 identifier 


veparams: “var"™ namewlist “3"* identifier . 


p-params: “procedure” namewlist . 
c-params: name-list "3s" identifier . 


scalar-type 
subrange-t ype 
potnter-type 
set-type 
arraye=type 
recora=type 
file-type 
identifier 


type: 


@eqed@eo2 gq eet, 


ey" @e @e @e w= 


Scalare-tyoe: *(" namew-list ")" . 
suorange-type: constant “*.." constant . 
pointer-type: "tT" identifier . 


sete-type: { packed J] "set" "of" simple-type 


array-type; (packeg]) “array” TOmmEsubScrmiots “}  “of" type! . 


record=type: [ oacked } "recora”™ [( field=list ] 


file-type: { pacxed ] "file”™ “of" type . 
packed: *"nacked”™ . 
identifier 


scalar-type 
Subrange-type 


Simple-tyoe? 


“ee we oty 


{ varefields 
; mixed-fields 
} 


field-list: 


mixed=fields: fixed-fields [ andevar-fields ] 
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anagevar“-fields: *;" varefields . 


fixede-fields: #8 fixedefield "3" . .« e ‘ 


” " 


° type . 


fixeawfiela: name-list 


varefields: “case” [ tag ) identifier "of 


variants . 
variants: # variant . « e ‘ 


variant: constantelist "3" i { field=-]ist }) "“)" 


constantelist: # constant , ae sues ‘ 


w ” 


e 
g e e e @ 


Statements: # statement 
statement: [ integer J) action ) . 


assignment 
procedures-cali | 
compound 
if-statement 
repeat 

while 

for 
casewstatement 
goto 

with 


actions 


ae @2 gh. 


ar ee ee ee @e wee we oe 


assignment: variable "=" exoression . 


procedure-call: identifier f€ arguments ] . 
arguments: “(" arglist “)" . 
arglist: # argument “," . « . “ 
argument: { identifier 

* expression 

,. 
compounds "“begin" 

Statements 

"end" , 

if-statements "if" expression “then” 


statement 
{ elsew-part }] . 





else=part: "else" 
Statement . 


repeat: "eepeat” 
Statements 
"until™ expression . 


while: “while” expression "“do" 
statement . 


for: "for" identifier ":=" exoression terored expression “do” 


Statement . 


t-or-d: { downto 


t 


’ to 
yews 


downto: "downto”™ . 
mone “to” . 
cCase-statement: “case” expression "of" 
cases 
‘end” . 
meees: # cage "3" . « e . 


case: constantelist "3" statement. 


withs “with” variables "do" 
statement . 


goto: “goto” integer . 
variables: # variable "*," . .« .« 3 
lt 
Ite 
eq 


ate 


exoresstions { 
‘ 
4 
4 
‘ 
‘ aie 
4 
6 
‘ 
‘ 
¢ 
} 


neq 
mn 
S"-exoression 
lt: s-expression “<" sgexpression. 


Ite: swexpression "<=" s-expression. 


eq: seexpression “=" srexpressione 
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gte: swtexoression ‘'>=" ssexpression. 
gt: swexpression ">" srexoressione 

neq: sexoression "<>" ssexpression. 
ins swexpression “in” ssexpression. 


s“eexpression:s [{ sign ] usexpression. 


{ plus*sign 
‘ minus*siQqn 
} 


wy 


plus=siqn: 
minus*signs "=" , 
olus 


minus 


umexpressions { 
‘ 
4 
"i or 
4 
§ 
) 


term 


plus; term "+" term . 
minus: term “=-" term . 
ors term “or™ term . 
times 

quot 


div 


terms { 
‘ 
; mod 
4 
} 


and 
factor 


mameos factor "*" factor . 
quot; factor "“/" factor . 

div: factor "div" factor . 
mod: factor “mod" factor . 


ang: factor “and” factor . 
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Pactor: { group 
’ not 
set 
VO Fr =C¢ 


a ae 
e 


Oe xs 


groups "(" exoression 


moe. "HNot”™ factor . 


set > Ad { set=members } "J" 


setememoers: 4 setememoer .. - ‘ 


setemember: { range 
; expression 
} 


range: expression ".."% expression . 


veorecs { unsigned=constant 
' variapnle 
} 


variable: identifier [ modifiers ] . 
modifiers: + modifier . 

subscript 
field-reference 


indirection 


modifiers 


{ 
' 
} 


ohare 


suoscriot: "(€" exoressions 

field-reference: *.”* identifier . 

mayvree@ction: “tT” , 

expressions: # exoression "," 2. « -« 2 
It is assumed that predefined input 


the rule names “integer", “identifier”, 


“unsiqned-constant”. 


1:59 


Scanners exist for 


"eonstant", 


and 


APPENDIX C: TRANSFORMATION TEMPLATE GRAMMAR 


The following grammar defines symbol] strings which are 
interpreted as calls to tree=building and node-modifying 
routines whose existence is assumed, as is the interpreter 
which makes those calls. Aliso implicit in the following de- 
finitions and discussion is the notion of a “current node", 
gefined for the purpose of the application of temolates to 


be anv free node in an AST. 


template: { subtree } siblist } . 
subtrees boundnode ([chiidlist) . 
Gmilaiists "(" sibliste ")" . 
siblist: # freenode "5" wae 


boundnode: boundoon rulefield ,. 


freenode? freeop rulefield . 

rulefields "," rulename . 

boundop: Ceneo G9 DheR « ELST), pdf} . 

cit > { (predefined functions) } . 

freeop: { NT |; ALT { COPT $} YOPT 3} LOPT $! TERM} , 
rulenames: { (grammar rulenames) 


; (predefined rulenames) } ., 
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The Temolate Grammar oroduces operator and rulename 
pairsr both bound and free, punctuated by the terminal sym= 


cas "("*, %s", "," and ")" which are interpreted as follows: 


"("s Create a child node under the current node, make 
the node created the current node, and overwrite the OP 
field with the operator listed next. 

"7": Create a right sibling of the current node, make 
the node created the current node, and overwrite the OP 
field with the operator listed next. 

",": OQOverwrite the RULE field of the current node with 
the rulename listed next. 


")": Make the father of the current node the new 


current node. 


The first symbol of every template is an operator, ei 
ther free or bounds, which overwrites the OP field of the 
current node. The current node is the only node in the AST 
which ts modified in any way by a temolate; new nodes may 
be created, but always within the context of the current 
node. 

The templates defined by this grammar allow definition 
of the transformations in Chapter III. The following exame- 
ples illustrate the various constructions most commonly ene 


countered. 
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Ll. Sinale node replacement, rule field unchanged: 


Transformations 
NT,a => ALT,a 
Templates: 


ALT,a 


Ce. Single node replacements operator and rulename modified: 


Transformation: 

ALT,a => NTer 
Template: 

NTor 
53. Replacement with sibling string: 
Transformation: 

TOPT,i1 => COPT,re NT,ri IOPT,-i 
Temolate: 

COPT,re + NT,rt ¢ YOPT,i 
4. Replacement with subtree: 
Transformation: 

NT,c => NT,r1 COPT,re NoIord 
Template: 


HEADec ( NTeri ¢ COPT,re2 5 NTer3 ) 
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APPENDIX Ds 


meeps 
rulelists 


rules 


cerules 


c-rule-a; 


cdef-a:; 
defoart: 
option: 
atlas: 
headops 
head: 
pdf: 
freelist: 
freenode: 
freeop: 
Nts 

copt: 


cslas: 


INTERMEDIATE*LEVEL LANGUAGE DEFINITION GRAMMAR 


langname rulelist lextensions]). 


+ rule. 


{ crerule 


: azrule 


; jerule 
' lerule }. 
{ cerulera 


f e=rulero }. 


c=rulename : cdef-a 


"s>" v=o" 


etisa cslae 


+ defpart. 


{ rulename { option !} terminal }. 


"(" reulename "Jj" 


headoo "(" freelist ")" 


{ head ; pdf }. 
HEAD . 


{ (predefined functions) }. 


+ freenode g eee e 


freeoop "," 


rulename. 
iene CODG oF 
SiN Te. 

eGR)”. 


+ dispart. 
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diseart: { subtreé | literal {| format }. 
suotrees “Ss integer toodisfid]. 


eeogisfids { optodf ;: odfodf |; undodf }. 


optodfs: o=e"(" Pulemname “)""*", 
odfodfs wa" "*<* rulename “>"**, 
undodf: w= “(* pulename *)"**". 


cerule=b3: cerulename “:" cdefe-b 


pact ib "=>" eslb. 


cdef-b: terminal. 
etd: "HEAD," cerulename. 
cslbs + termoart. 


termpart: { literal {| format }. 
a-rules aerulename "3" adef 


Seat) “=>fMates- =>" asi "=>" ase, 


adefs mueevartizst "}"., 
altlists Roalt "im soe e 

alt: altchar “s:" rulename. 
ati: "ALT," asrulename. 
ates wieowalt=—témp “pe 


altetemps # altet ope a 


arcot: altchar “s: NT," rulename. 
asi: "{" gerulename "}". 
ase: “{"* walt=diso "}". 


meereaoisos #alted "1" 26 e 
alteds altchar “3:" rulename,. 


ie-rules: im=rulename “3" jidef 
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idef: 
mei: 
mec: 
isi: 
Wsie : 


lerules 


lerule-as 


ldef-a: 


lIteas: 


lseas 


lerule-b: 


ldef-b: 


lteb: 


lseb: 


leruleecs 


ldefecs: 
heec : 


lsec: 


Via te "=>" 4ol “=>" ise. 


"+" reulenamel. 


"ITER € NT-™ rulenamel "3 [OPT-" ierulename *)". 


"NT," eulenamet *; IOPT," terulename. 
"Si". 
“iat =cubename “}". 
{ lerulesa 
: leruleeb 
; lerulesc }. 
}lerulename "3" Idefea 

ta teas” ltea “s>" si "“s>" Isea 
"#" rulenamel rulenamee2d "...". 
"NT," rulenamee "; NT," rulenamel 

"3; LOPT," lerulename. 

mp ebiet 
le-rulename “:" Idef-b 

feet =e 1t2b “=>" Isi "=>" Iseb 


po cae a ae 


"#" rulenamel "(" rulenamee 

"COPT,"™ rulenamee “; NT," rulenamel 
"; LOPT," lerulename. 

"Siz €" rulenameed “}3$e2", 


lerulename “3" Ildefec 
f= ities" tec "S>" 151 *S>" Iseec 
"#" rulenamel terminal "...". 


"NT," rulenamel “; LOPT,”" lerulename. 


terminal "$1". 
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= oe 1S 5: 


.=o }s3e 


wee SS. 





fel: "LIST €( NY," rulenamel "7 LOPT,"” lerulename ")”. 
Isis: mol. 

ns 5 $ "(€" lerulename “)". 

format: { newline | tab {| untab }. 

newline? PINES 5 

tab: oe. 

untab: MO 


extensions: userpdr userodf. 
userpadr: (undefined) . 


userpdf: (undefined) . 
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APPENDIX E: ILO GRAMMAR LANGUAGE DEFINITION 


mud: lTangname rulelist fextensions] 
=> ItLD,Itod 
(NT,String? 
NT,rulelist,; 
COPT,extensions) 


=> $1="<langname>" $2 $3="fextensions})” . 


rulelists + rule 
=> ITER,rulelist 
(NT,eule; 
IOPT,rulelist) 
=> NT,rules 
[0PT,rulelist 
=> $1 


=> “"{rulelist]"™ . 


a7 





rules { cerule 
' azrule 
> ierule 
; lerule } 
=> ALT,rule 
=> { c:NT,c-rule 
; a3tNT,a-rule 
; itNT,ierule 
: ItNT,lerule } 
=> "{rule}" 


ecco sce rule » asatyrule | isi@rule,: Ilslerule }" . 


cerules { cerule-a 
; c@rule=b } 
=> ALTrc-rule 
=> { atcerulesa 
; bscerule=b } 
=> "{c-rule}" 


=> "{ asererulewa } otcerule=b }"* . 
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cerulesa: cerulename ":" cdefea 
"=>" etla "=>" esla 
=> HEAD,cerulesa 
(NT,String; 
NT,cdefea? 
NT,ctlaz 
NT,esla) 


=> $13="<ce-rulename>" *3" $2 "“s>" $3 “"s>" 34 


cdef-a:; + defpart 
=> ITER,cdefea 
(NT,defpart; 
IOPT,cdef-a) 
=> NT,defpart; 
IOPT,cdefe-a 
=> $1 


=> "[(defpart]”" . 


defpart: { rulename ; ootion !} terminal } 


=> ALT,defpart 
=> { reNT,String 


o:NT,option 
; t:NT,terminal } 
=> "“{defpart}"* 


=> "{ rsrulename {} otoption ! titerminal }" 
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option: "(" rsulename “)" 
=> HEAD,option 
(NT,String) 


=> "(" $12="<rulename>" "jj" , 


eria: headop "(" freelist ")” 
=> HEAD,ctla 
(NT, headop; 
NT, freelist) 


kl C$ 2 "3" 


headop: { heag ; pdf } 
=> ALT,headop 
=> { AsNT,head 
; p:NT,odf} 
=> "{headoo}" 


eo NeteAD |, Ofodf }" , 


head: "HEAD" 
=> HEAD,head 


=> "HEAD" , 


odf; { (predefined functions) } 
=> ALT,pdf 
=> {} 
=> "{odf}" 


oe | ty Cw 


Poo 





freelist: # freenode 


=> 


LIST,freelist 
(NT, freenode; 
LOPT,freelist) 

NT, freenode;, 

LOPT,freelist 

5 1 

31 


*{freenode)”™ . 


freenode: freeop “,"* rulename 


freeop: 


nt 


=> 


=_-> 


=> 


HEAD, freenode 
(NT, freeop; 
NT,String) 


$1 "7" $2="<rulename>" . 


{ nt + copt } 
ALT, freeop 
{ asNT,nt 
+ ¢sNT,copt } 
"{freeop}” 


Bete 4 CSCOPYT }* . 


NES 
HEAD,nt 


eNT" 3 


fot 





copt s 
2 


=> 


dispart: 
=> 


=> 


=> 


=> 


subtrees 


=> 


=> 


"“COPRT= 
HEAD,copt 


Seog" We 


+ dispart 
ITER,csia 
(NT,dispart; 
1O0PT,cslia) 
NT,dispart, 
IOPT,csia 
31 


"f{disoart)”™ . 


{ subtree } literal {| format } 
ALT,disoart 
{ s:NT,subtree 
; ISNT, literal 
' f3:NT, format } 
"“{dispart}" 


"{ sz:subtree } I:tliteral } fsformat }" 


"S$" integer foopdisflid] 
HEAD, subtree 
(NT, Integer; 
COPT,opdisfiad) 


"S$" S$1s"<integer>" S$2="foodisfildj]”" . 


lc 


eedgisfid: { optodf i: pdfodf ; undodf } 


=> 


=> 


optodf: 


=> 


pdfodf: 


=> 


ALT,opdisfid 

{ o:NT,optodf 

1 Et NT,pdfodf 

+ utNT,undodf } 
"{opdisfid}" 


Pinsomiectodt); pspdarodf | usundodf }>” 


ae 


*s""(" pulename 
HEAD, optodf 
(NT,String) 


e=""(" $12"<rpulename>* *)""" 


vate <" rpulename “>""" 


HEAD, odfodf 
(NT,String) 


saem<™ $1="<rulename>* *>""" 


c=" “CS reulename )""" 


HEAD,undodf 
(NT,String) 


MSPn (CH Sie"<pyulename>"™ "jyrr* 


tS 





cerule=b; c=rulename ":" cdef=-b 
gas Ctib "=>" esib 
=> HEAD,c-rule-bd 
(NT,Strings 
NT,cdef-b; 
NT,ctlb; 
NT,cslb) 


=> $1="<cerulename>” "3" $2 "=>" $3 "=>" $4 


cdef-b: terminal 
=> HEAD,cdef-b 
(NT,terminal) 


aie OCU le 


ctlb: "HEAD," c#rulename 
=> HEAD,ctlb 
(NT,String) 


=> "HEAD," $1="<cerulename>" . 


cslb; + termpart 
=> ITER,cslb 
(NT,termpart; 
IOPT,cslib) 
=> NT,termpart; 
IOPT,cslb 
=> $1 


=> "“(termpart)]” . 
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termparts: { literal ! format } 


=> ALT,termpart 

=> { Tent, litera! 
, f£3NT, format } 

=> "{termpart}" 


seer ~t{ Velyteral { ¢f:format })“ . 


a=rules a=rulename "3" adef 
"=>" atl "=>" at2 “=>" asi "=>" ase 
=> HEAD,a-rule 
(NT,String; 
NT,adef; 
NT,atl; 
NT, ate; 
NT, asl; 
NT,ase) 
=> $S$1="<a-rulename>" “3" $2 


ees ees 5S “=>" at . 


adef:; ato mare hy st. “pe 
=> HEAD, adef 
(NT,altlist) 


aoe ( UOaltlist “)}" . 
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euitiists 


ats 


ati: 


ate: 


=> 


Halt "i" eo. 
LIST,altlist 
(NT,alt; 
LOPT,altlist) 
NT,alt; 
LOPT,altlist 
$1 
$1 


staltiietey” . 


altchar “:" rulename 
HEAD,alt 
(NT,Character; 
NT,String) 


S1="<altchar>" "3" $2="<rulename>" 


"ALT," a-rulename 
HEAD,atl 
(NT, String) 


"ALT," S$1="<aerulename>" . 


"{" altetemp “}" 
HEAD,ate 
(NT,alt=-temp) 


mcm ete” }™ 
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mre cemos iF altw~t "7" wee 


alt- 


asl: 


wee : 


=> 


LIST,alt=temp 
(NT,alte-t; 
LOPT,alt-temp) 
NT,alteots 
LOPT,altetemo 
$1 
$i 


“hac et}"" . 


altchar "3: NT," rulename 
HEAD,alte-t 
(NT,Character; 
NT,String) 


$1="<altchar>”" "3 NT," $2="<rulename>" 


*"{" aerulename "}" 
HEAD,asi 
(NT,String) 


"<<" $is"*<gerulename>" "}" , 
"{" altedisp "}" 
HEAD, ase 


(NT,alt-disp) 


ete $1 oe - 
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alt-d1sp:; # alted =" eee 


=> 


alteds 


=? 


LIST,altdisp 
(NT, alted; 
LOPT,alt-disp) 

NT, alted, 

LOPT,altedisp 

$1 

Si 


= tale) SO] - ° 


altchar *:" rulename 
HEAD, alted 
(NT,Character; 
NT,String) 


$1="<altchar>” "3" $2="<pulename>" , 


imrulename "3" idef 
MZ oytioe=s>: te “=s>" {si "=>" 482 

HEAD, ierule 

(NT,String; 

NT, idef; 

NT, 1tl; 

NT, ites 

NT,isl; 

NT,ise2) 
$1="<ierulename>” "3" $2 


sce =e Sie" => SS "=>" "$6 
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idef: 


ie is 


mec: 


vwsi: 


"Se: 


"+" rulenamel 
HEAD, idef 
(NT,String) 


"+e" S$1="<rulenamel>”" . 


PIVERT( NE; ” 
HEAD, it] 
(NT,String; 
NT,String) 
$i="<rulenamel>" 


PER GUN Te.” 


B2e="<jerulename>" . 


"NT," rulenamel “; I[OPT,” 
HEA, ite 

(NT,String; 

NT,String) 


"NT," S1="<rulenamel>" 


nsoj" 
HEAD,is1 


Si” 


wale oq * 


isrulename 

HEAD, is2 
(NT,String) 

non myn 


$1="<jerulename>" 


Po 


rulenamel “; [OPT,* 


" 
v 


8 a 


1=rulename ")" 


DOr ls. 


i=rulename 


$2="<j-rulename>" 


lerules {erut ea 
1 lerulesb 
; leruleec } 
=> ALI,\lerule 
=> { a:tNT,lerulesa 
, b&NT,leruleso 
; ¢:NT,lerulesc } 


=> "{lerule}" 


s> "{ aztlerulewa | btlerule=b ! csleruleec }" . 
lerule-a: lerulename “*:" \defea 
eS tl "=>" \t2a "=>" 351 "=>" Isea “=>" 183 


=> HEAD, 1"rule-a 
(NT,String; 
NT, )def-a; 
NT, els 
NT, lta; 
NT,1sl, 
NT,1)s2a3 
NT,1s3) 
=> $1="<lerulename>" "3" $2 


=e tes So aS Sse" SS "Sa" $6 “s>" $7 . 
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ldef-a: "#" rulenamel rulenamec "... 
=> HEAD,|\def-a 
(NT,String; 
NT,String) 


=> "#8" $1="<rulenamel>”" $e="<rulenamec>" “..." . 


lItea: "NT," rulenamee "7 NT," rulenamel 
"; LOPT," lerulename 
=> HEAD,1\t2a 
(NT, String; 
NT,String? 
NT,String) 
=> "NT," $1="<rulenamece>" "7 NT," S$2="<rulenameli>" 


">; LOPT,” $3="<lerulename>" , 


lisa: SSioe: 
=> HEAD, 1s2ea 


=> "$1$e" . 


Lol 


lerulesb: Ierulename "3" Idef-b 
aoe | t Peeeeeelitebh wsa" }jsl "=>" 
=> HEAD, )eruleeb 
(NT,Strings 
NT, l}ldefeb; 
NT, Jett; 
NT,)t2b; 
NT,Vsl7 
NT, 1}se2b; 
NT,183) 
=> $1="<lerulename>" "3" $e 


See ($5) "=>" SH “=>” $5 “=>” So 


ldefeb: "#" pulenamel "(" rulenamee ")" 
=> HEAD,|)defe-b 
(NT,Strings; 


COPT,String) 


=> "#4" S$is"<rulenamel>”" "{(" S$2="<rulenamec>" 


}seb 


Wms 


" 
eee 


Weeb: *"COPT,”" rulenamec "; NT," rulenamel 


"; LOPT,”" lerulename 
=> HEAD,)t2bo 
(COPT,String; 
NT,String; 


NT,String) 


Ray ' 


$7. 


je 


1s3 


=> "“COPT," $1="<rulenamee>” "7; NT,” S2="<rulenamel>" 


"3 LOPT," $3="<lerulename>"” . 


oc 


lseb3 "Si={" rulenamece "J 3c" 
=> HEAD, 1seb 
(NT,String) 


=> "S$1=(" $1="<rulenamee>* “]$2" . 


lerule-c: lerulename ":" |Idefe-c 


(=e le eee tec =O” 1s] "Sa" see *=>"* F553 


=> HEAD,1erulesec 
(NT, String; 
NT, ldef-c; 
NT, Vel; 
NT, 1ltec? 
NT, Isl; 
NT, lsecz 
NT,1s3) 
=> $1="<lerulename>" "3" $2 


"=>" $3 "=>" Sua Wao" $5 Fad" FH "=>" $7 


ldefec: "#" rulenamel terminal "“,.. 
=> HEAD,I|defec 
(NT, String; 
NT, termina) ) 


=2 a" $1="<ruleanamel>" $2 a yeceee © 
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mtcc "NT," rulenamel “; LOPT,” lerulename 
=> HEAD,)t2c 
(NT,Strings 
NT,String) 


=> "NT," S1="<rulenamel>" "“LOPT,” S$2="<l-rulename>" 


Psec: terminal "$i" 
=> HEAD, 1)sec 
(NT,terminal ) 


=> $1="<terminal>™ "$1" 


itis "LIST € NT," rulenamel "“; LOPT,” I=rulename “)”" 
=> HEAD, 1t1 
(NT,Strings 
NT,String) 
Seep List €( Nig” S$1="<ruienamel>" "; LOPT," 


ik 


$2="<lerulename>" 


ire. 3 Sols: 
=> HEAD,1s$1 


=e) DL Cy 


s3: "(" lerulename “Jj " 
=> HEAD,1}s83 


(lerulename) 


=> *"“(" $1="<lerulename>" "J" , 
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terminal: literal 


=> 


literal: 


=> 


format: 


new!) 


tao: 


=> 


=> 


Head,termina] 
(NT,String) 


i | Sils"<terminal>" Te a | 


literal 
Head, literal 
(NT,String) 


we ff tt fe $1="*<jliteral>" em oe 


{ newline | tab { untab } 
ALT, format 
{ nzNT,newline 
; tsNT,tab 
: usNT,untab } 
"{format}" 


"{ ninewline ! titab ! uztuntab }" 
oui 
HEAD,newline 


= Ni e 


eTR*" 
HEAD, tab 


"TB" ‘ 
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untab: ‘OT 
=> HEAD,untab 


=> “UT . 


extensions: userpdr userodf 
=> HEAD,extensions 
(NT,userodr; 
NT, ,userodf) 


== mi $e . 


userpor: (undefined) . 


userpdf: (undefined) . 


Strina, Integer, and Character are system predefined 
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APPENDIX Fs MEMORANDUM LANGUAGE DEFINITION 


The following Language Definition, constructed by hand, 
1Jlustrates the temelates and schemas required for the dee- 
finition of a simole grammar. When realized as an AST via 
the ILD Grammar Directed Editor and interpreted by the sys 
tem predefined function ILD, this Language Definition could 
be Installed in the Language Definition Module as oart of a 


Memorandum GDE. 


memos (salutation) body [closing] 
=>ILD,memo 
(COPT,salutation; 
NT ,body, 
COPT,closing) 


=>NL S$1="{salutation)”"™ $e NL TB TB TB 53="{closing]". 


Salutation:"Dear"” name "," 
=>HEAD, salutation 
(NT,Strina) 


=>"Dear™ S$1="<name>" "," . 


body: + paragraph 
=>I TER, body 
(NT,paragraphs 


IOPT, body) 
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=>NT,oaragraohs 
IOPT,body 
s>NL TSB UT $1 


=>NL “(paragraph)” . 


paragraph: + lines 
=>ITER,paragraph 
(NT, Strings 
IOPT,paragraoh) 
= >NT,Strings 
IOPT,oaragraph 
=>$1="<line>* NL 


=>"({line]”™ NL . 


closing: *Sincerely,” name 
=>HEAD,closina 
(NT,String) 


=>"Sincerely,”™ NL S1="<name>" . 


String 18 a system predefined rule. 
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APPEMDTX Gt: SYSTEM PREUFFINED FUACTTUNS 


inemtoatlowing Wena list of orogrammina Pan@uee 2. Or Ve 1) 
tive ooerations, derived moor tmtrom (irratt, 19/5}, wnich 
coula ve implemented as System Pregefined Functions. This 
myst is 6not intendea as a comprehensive collection of tne 
primitives desireds or even required, for itolementation of 
a UNte system. FPather, these functions are oresented here as 
an indication of the classes of operations which miant oe 


made available in supoort of users of the GCE. 


Synthesis VUoerators 


|, Se 

es §86COrT 
5. Pert 
4. ne | 
ae «=O ALT 

ae ea 
les «6rd HEAD 
8. ITEP 
cae List 


Arithmetic Operators 


moe PLUS 
mie | 6OU MENUS 
2 Roto leat 1 on 





is, «= DALY division 


i4. REM remainder 
1S. UPLUS unary plus 
16. UMINUS unary minus 


Relational Operators 


17. EQUAL equality 

18. NTEQ not equal 

ie. GT greater than 

20. tL! less than 

ee «6 CG TE greater than or equal 
ee. LTE less than or equal 


Boolean Operators 


23. AND 
e4. OR 
eo NOT 


Assignment Operators 
26. ASNA arithmetic assianment 


27. ASNS string assiaqnment 


Sequence Control Operators 


eo. COND I1f-then-else conditional 
29. \|LOOP generalized loop 
30. CASE 
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Symbol Table and Data Element Uperators 


32. DECLARE declaration 


33. BLOCK 
34, IDENT identifier 
35. NUMBER 


36. STRING 


System Operators 


Bare 6©6sdSo ALD AST to Language Definition translation 


Miscellaneous 


38. NOP null operation 


eit 





APPENDIX H. FIGURES 


<root> 


oroagram <id> ( <names> ) } <b lock> 


| 
tree <id> I(names) 


inout ! <id> 1 (names) 


output 





0(1) o€t) of€v) ofsr) begin <statements> end 


<vars> <statement> I(statements) 
| | 


¢ ; { : 
var <vedecIis> ; olinteger) ofCaction) 
t I 
€names> ° 


> <type>m <assignment> 

‘ | | 

<ia> !l(names) <id> <variable> $= <expr> 
a inteaer 


* e | ° 
<id> o(modifiers) <swexpor> 


3 o(sian) <utexpr> 
Note: non=-termina!l names <term> 

have been abbreviated. <factor> 

<V =o F=C- 

proaram tree (Cinput,output); <usconstant> 

var $: a- 

begin l 

Fr | 
end. 


Figure 1. Parse tree for a trivial program. 
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CONCATENATIONS 
eee xl x2 cee XN , Vie=atieek tek fl lULtk OC} 


<rk> if ok = oC kK 
<c> => 


CoepelCnmko7 yf xk. = "“1"rk"%) “ 


copt(r) => <r> 


ALTERNATION: 
a : ei" ri eer re ween eg ee rn ate 
<a> Serie, See tf «ee 4 SPN? } 
ITERATIONS 
—e ? lh 
<j> => <rF> jopt(i) 


iopt(i) => <P> jioptli) 


ro} $ 
ease rt x ef , (oma t ore, “wane” 1" {| t. } 
<)> => <rl> lopt()) 
<re2> <rl> lopt()) if x = re 
lopt(1l) => Gopucrelmecci> Topt(!) if x = ™€"re")" 
<rel> lopt()) if x set 
PREDEFINEDS 
on. Dd f 
<p> => odf(p) 
UNDEFINED: 
<u> => <y> 


c in C = { concatenation rules } 
ain A = { alternation rules } 

i in I = { iteration rules } 

) in t = { list rules } 

p in P = { predefined rules } 

u In U = { undefined rules } 

r in R = { C,A,I1,L,P,U } 

t in T = { terminal symbols } 


Figure e2. Transformations 


CONCATENATIONS 
eee xi xe cee XN ’ Vacate oon eee) ef ek | 


NT ,rk ft xk = fk 
Nyc => 

CUR eifexk = TUtek )c 
COPT,r =? NT-er 


ALTERNATION: 
fee) M7 ree" MEY en MO" 
NT sa => { NT,erl : NT ere : eee : NT arn } 


ITERATION: 
oo +" cr 


NT oi => NTt,r IOPT,i 


1OPT,i =? NT or IOPT,3 


tots: 
ieee” 6rldll lx Ue ’ <= 4 rem, weieerce"}"™ |: ¢e¢ } 
NT, 1} => NT, rl LOPT,1 
NT,re NT,rl LOPT,] if x = re 
LOPT,1 => COPT,r2 NT,ri LOPT,1 ifeox = (ee) " 
NT,rl LOPT, 1) i = ¢ 
PREDEFINED: 
ome pdt 
NT,9 =? PDF (p)s,p 
UNDEFINED: 
NT,u =? NT yu 
c in C = { concatenation rules } 
ain A = { alternation rules } 
iin I = { iteration rules } 
1) in kL = { Fist rules } 
Pp In P = { predefined rules } 
u in U = { undefined rules } 
r in Rs { C,A,1I,L,-P,U } 
t in T = { terminal symbols } 


Figure 3. Labelled Transformations 


174 





CONCATENATIONS 


cee xl Ooxe «C«w 
NT,e 


COPT,r 


@ @ 


ALTERNATIONS 


a er 


NT,a 
ALT,a 


ITERATIGN?: 
eo hth” CUP 


NT >i 
LOPT, i 


EIS!t: 
1: ve ri 


NT,1 


LOPT, 1 


PREDEFINED 
pee. Od f 


NT,p 
TERM,p 


UNDEFINED: 


NT ,u 


ve 8 
§ 


mn 
mn 
mn 
mn 
mn 
mn 
mn 
mn 


xn ’ MiGs ree, fo ek T tk. «} 
NT ,rk 1f xk = Prk 
=> 
COPT,rk ttre = (Pk) 
=> NT,er 
n re wee ere ee rn ye 
=> ALT,za 
=i te mcllRee,; «es » Nieen } 
=> NTz,r IOPT,i 
=> NT,r IOPT,i 
ryeuet oie , Cees (ict mes” Cy tC) 
=> NT,ri LOPT,1 
NT,ere NT,erl LOPT, 1) if x = re 
=> COPT,»,r2 NT,ri CLOPT,) +{ux. = “Cres )* 
NT,erl LOPT,1 if x =t 
=> TERM,p 
=? PDF (p),0 
=? NT ,u 
C = { concatenation rules } 
A = { alternation rules } 
{ = { tteration rules } 
L = { list rules } 
P = { predefined rules } 
U = { undefined rules } 
R= { C,A,1,L,P,U , 
T = { terminal symbols } 
igure 4. Extended Transformations 
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